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Preface 


‘Mathematical statistics’ never lost its attractiveness, both as a mathematical 
discipline and for its applications in nearly all parts of empirical research. 
During the last years it was found that not everything that is mathematically 
optimal is also practically recommendable if we are not sure whether the 
assumptions (for instance, normality) are valid. 

As an example we consider the two-sample ¢-test that is an optimal 
(uniformly most powerful unbiased) test if all assumptions are fulfilled. In appli- 
cations however, we are often not sure that both variances are equal. Then the 
approximate Welch test is preferable. Such results have been found by extensive 
simulation experiments that played a much greater role the last time (see the 
eight international conferences about this topic since 1994 under http://iws. 
boku.ac.at). 

Therefore we wrote in 2016 a new book in German (Rasch and Schott, 2016) 
based on Rasch (1995) incorporating the developments of the last years. 

We dropped the first part of the book from 1995 containing measure and 
probability theory because we have excellent books about this such as Billingsley 
(2012) and Kallenberg (2002). 

Considering the positive resonance to this book in the community of statis- 
tics, we decided to present an English version of our book from 2016. We thank 
Alison Oliver for the reception into Wiley’s publishing programme. 

We assume from probability theory knowledge about exponential families 
as well as central and non-central t-, y”- and F-distributions. Because the def- 
inition of exponential families is basic for some chapters, it is repeated in 
this book. 

Most of the authors of books about mathematical statistics assume that data 
already exist and must be analysed. But we think that the optimal design for col- 
lecting data is at least as important as the statistical analysis. Therefore, in addi- 
tion to statistical analysis, we included the design of experiments. The optimal 
allocation is described in the chapters on regression analysis. Finally a chapter 
about experimental designs is added. 
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Preface 


For practical calculations of data, we present and use in some parts of the book 
IBM SPSS Statistics 24 for the statistical analysis, and we thank Dr. Johannes 
Gladitz (Berlin) for giving us access to it. Unfortunately, it is not possible to 
change within SPSS to British English — therefore, you find in the screens 
and in our command ‘Analyze’. 

The determination of sample sizes can be found together with the description 
of the method of analysis, and for the sample size determination and other 
design problems, we offer the package OPDOE (Optimal Design of Experi- 
ments) under GR. 

We heartily thank Prof. Dr. Rob Verdooren (Wageningen, Netherlands) for 
proving the correctness of statistics and Sandra Almgren (Kremmling, CO, 
USA) for improving the English text. 


Rostock, December 2017 Dieter Rasch and Dieter Schott 
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Basic Ideas of Mathematical Statistics 


Elementary statistical computations have been carried out for thousands of 
years. For example, the arithmetic mean from a number of measures or obser- 
vation data has been known for a very long time. 

First descriptive statistics arose starting with the collection of data, for exam- 
ple, at the national census or in registers of medical cards, and followed by com- 
pression of these data in the form of statistics or graphical representations 
(figures). Mathematical statistics developed on the fundament of probability 
theory from the end of 19th century on. At the beginning of the 20th century, 
Karl Pearson and Sir Ronald Aylmer Fisher were notable pioneers of this new 
discipline. Fisher’s book (1925) was a milestone providing experimenters such 
basic concepts as his well-known maximum likelihood method and analysis of 
variance as well as notions of sufficiency and efficiency. An important informa- 
tion measure is still called the Fisher information (see Section 1.4). 

Concerning historical development we do not want to go into detail. We refer 
interested readers to Stigler (1986, 1990). Instead we will describe the actual 
state of the theory. Nevertheless many stimuli come from real applications. 
Hence, from time to time we will include real examples. 

Although the probability calculus is the fundament of mathematical statistics, 
many practical problems containing statements about random variables cannot 
be solved with this calculus alone. For example, we often look for statements 
about parameters of distribution functions although we do not partly or com- 
pletely know these functions. Mathematical statistics is considered in many intro- 
ductory textbooks as the theory of analysing experiments or samples; that is, it is 
assumed that a random sample (corresponding to Section 1.1) is given. Often it is 
not considered how to get such a random sample in an optimal way. This is trea- 
ted later in design of experiments. But in concrete applications, the experiment 
first has to be planned, and after the experiment is finished, the analysis has to be 
carried out. But in theory it is appropriate to determine firstly the optimal eval- 
uation, for example, the smallest sample size for a variance optimal estimator. 
Hence we proceed in such a way and start with the optimal evaluation, and after 
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this we work out the design problems. An exception is made for sequential meth- 
ods where planning and evaluation are realised together. 

Mathematical statistics involves mainly the theory of point estimation, statis- 
tical selection theory, the theory of hypothesis testing and the theory of confi- 
dence estimation. In these areas theorems are proved, showing which 
procedures are the best ones under special assumptions. 

We wish to make clear that the treatment of mathematical statistics on the 
one hand and its application to concrete data material on the other hand are 
totally different concepts. Although the same terms often occur, they need 
not be confused. Strictly speaking, the notions of the empirical sphere (hence 
of the real world) are related to corresponding models in theory. 

If assumptions for deriving best methods are not fulfilled in practical applica- 
tions, the question arises how good these best methods still are. Such questions 
are answered by a part of empirical statistics — by simulations. We often find 
that the assumption of a normal distribution occurring in many theorems is 
far from being a good model for many data in applications. In the last years sim- 
ulation developed into its own branch in mathematics. This shows a series of 
international workshops on simulation. The first to sixth workshops took place 
in St. Petersburg (Russia) in 1994, 1996, 1998, 2001, 2005 and 2009. The seventh 
international workshop on simulation took place in Rimini (Italy) in 2013 and 
the eighth one in Vienna (Austria) in 2015. 

Because the strength of assumptions has consequences mainly in hypothesis 
testing and confidence estimation, we discuss such problems first in Chapter 3, 
where we introduce the concept of robustness against the strength of 
assumptions. 


1.1. Statistical Population and Samples 


1.1.1 Concrete Samples and Statistical Populations 


In the empirical sciences, one character or several characters simultaneously 
(character vector) are observed in certain objects (or individuals) of a popula- 
tion. The main task is to conclude from the sample of observed values to the 
whole set of character values of all objects of this population. The problem is 
that there are objective or economical points of view that do not admit the com- 
plete survey of all character values in the population. We give some examples: 


e The costs to register all character values were out of all proportion to the value 
of the statement (for instance, measuring the height of all people worldwide 
older than 18 years). 

e The registration of character values results in destruction of the objects 
(destructive materials testing such as resistance to tearing of ropes or 
stockings). 
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e The set of objects is of hypothetic nature, for example, because they partly do 
not exist at the moment of investigation (as all products of a machine). 


We can neglect the few practical cases where all objects of a population can be 
observed and no more extensive population is demanded, because for them 
mathematical statistics is not needed. Therefore we assume that a certain part 
(subset) is chosen from the population to observe a character (or character vec- 
tor) from which we want to draw conclusions to the whole population. We call 
such a part a (concrete) sample (of the objects). The set of character values 
measured for these objects is said to be a (concrete) sample of the character 
values. Each object of the population is to possess such a character value (inde- 
pendent of whether we register the value or not). The set of character values of 
all objects in the population is called the corresponding statistical population. 

A concrete population as well as the (sought-after/relevant) character and 
therefore also the corresponding statistical population need to be determined 
uniquely. Populations have to be circumscribed in the first line in relation to 
space and time. In principle it must be clear for an arbitrary real object whether 
it belongs to the population or not. In the following we consider some examples: 


Original population Statistical population 
A Heifer of a certain breed in a certain A, Yearly yield of milk of these heifer 
FEGIOn ULSAOCH SUL eRe Az Body mass of these heifer after 
180 days 
A; Back height of these heifer 
B Inhabitants of a town at a certain day B, Blood pressure of these 


inhabitants at 6.00 o’clock 
B, Age of these inhabitants 


It is clear that applying conclusions from a sample to the whole population 
can be wrong. For example, if the children of a day nursery are chosen from 
the population B in the table above, then possibly the blood pressure B, but 
without doubt the age By are not applicable to B. Generally we speak of char- 
acters, but if they can have a certain influence to the experimental results, they 
are also called factors. The (mostly only a few) character values are said to be 
factor levels, and the combinations of factor levels of several factors factor level 
combinations. 

The sample should be representative with respect to all factors that can influ- 
ence the character of a statistical population. That means the composition of the 
population should be mirrored in the sample of objects. But that is impossible 
for small samples and many factor level combinations. For example, there are 
already about 200 factor level combinations in population B concerning the fac- 
tors age and sex, which cannot be representatively found in a sample of 100 
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inhabitants. Therefore we recommend avoiding the notion of ‘representative 
sample’ because it cannot be defined in a correct way. 

Samples should not be assessed according to the elements included but accord- 
ing to the way these elements have been selected. This way of selecting a sample is 
called sampling procedure. It can be applied either to the objects as statistical 
units or to the population of character values (e.g. in a databank). In the latter 
case the sample of character values arises immediately. In the first case the char- 
acter must be first registered at the selected objects. Both procedures (but not 
necessarily the created samples) are equivalent if the character value is registered 
for each registered object. This is assumed in this chapter. It is not the case in so- 
called censored samples where the character values could not be registered in all 
units of the experiment. For example, if the determination of lifespans of objects 
(as electronic components) is finished at a certain time, measured values of 
objects with longer lifespans (as time of determination) are missing. 

In the following we do not differ between samples of objects and samples of 
character values; the definitions hold for both. 


Definition 1.1 A sampling procedure is a rule of selecting a proper subset, 
named sample, from a well-defined finite basic set of objects (population, uni- 
verse). It is said to be at random if each element of the basic set has the same 
probability p to come into the sample. A (concrete) sample is the result of a sam- 
pling procedure. Samples resulting from a random sampling procedure are said 
to be (concrete) random samples. 


There are a lot of random sampling procedures in the theory of samples (see, e.g. 
Cochran and Boing, 1972; Kauermann and Kiichenhoff, 2011; Quatember, 2014) 
that can be used in practice. Basic sets of objects are mostly called (statistical) 
populations or synonymously sometimes (statistical) universes in the following. 


1.1.2 Sampling Procedures 


Concerning random sampling procedures, we distinguish (among other cases) 


e The simple sampling, where each element of the population has the same 
probability to come into the sample. 

e The stratified sampling, where a random sampling is done within the before- 
defined (disjoint) subclasses (strata) of the population. This kind of sampling 
is only at random as a whole if the sampling probabilities within the classes are 
chosen proportionally to the cardinalities of the classes. 

e The cluster sampling, where the population is divided again into disjoint sub- 
classes (clusters), but the sampling of objects is done not among the objects of 
the population itself but among the clusters. In the selected clusters all objects 
are registered. This kind of selection is often used as area samples. It is only at 
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random corresponding to Definition 1.1 if the clusters contain the same num- 
ber of objects. 

e The multistage sampling, where at least two stages of sampling are taken. In 
the latter case the population is firstly decomposed into disjoint subsets (pri- 
mary units). Then a random sampling is done in all primary units to get sec- 
ondary units. A multistage sampling is favourable if the population has a 
hierarchical structure (e.g. country, province, towns in the province). It is 
at random corresponding to Definition 1.1 if the primary units contain the 
same number of secondary units. 

e The (constantly) sequential sampling, where the sample size is not fixed at the 
beginning of the sampling procedure. At first a small sample is taken and ana- 
lysed. Then it is decided whether the obtained information is sufficient, for 
example, to reject or to accept a given hypothesis (see Chapter 3), or if more 
information is needed by selecting a further unit. 


Both a random sampling (procedure) and an arbitrary sampling (procedure) 
can result in the same concrete sample. Hence we cannot prove by inspecting 
the sample itself whether the sample is randomly chosen or not. We have to 
check the sampling procedure used. 

For the pure random sampling, Definition 1.1 is applied directly: each object 
in the population of size N is drawn with the same probability p. The number of 
objects in a sample is called sample size, mostly denoted by x. 

The most important case of a pure random sampling occurs if the objects drawn 
from a population are not put back. An example is a lottery, where 1 numbers are 
drawn from N given numbers (in Germany the well-known Lotto uses N = 49 and 


n = 6). Using an unconditioned sampling of size N, the number M = ("") of all 


possible subsets have the same probability p = a to come into the sample. 

As mentioned before, the sample itself is only at random if a random sampling 
method was used. But persons become at once suspicious if the sample is 
extreme. If somebody gets the top prize buying one lot of 10.000 possible lots, 
then this case is possible although rather unlikely. It can happen at random, or in 
other words, it can be the result of (a correct) random sampling. But if this per- 
son gets the top prize buying one lot at three consecutive lotteries of the men- 
tioned kind, and if it turns out additionally that the person is the brother of the 
lot seller, then doubts are justified that there was something fishy going on. We 
would refuse to accept such unlikely events and would suppose that something 
is wrong. In our lottery case, we would assume that the selection was not at ran- 
dom and that cheats were at work. Nevertheless, there is an extremely small 
possibility that this event is at random, namely, p = 1/1.000.000.000.000. 

Incidentally, the strategy of statistical tests in Chapter 3 is to refuse models 
(facts) under which observed events possess a very small probability and instead 
to accept models where these events have a larger probability. 
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A pure sampling also occurs if the random sample is obtained by replacing the 
objects immediately after drawing and observing that each object has the same 
probability to come into the sample using this procedure. Hence, the population 
always has the same number of objects before a new object is taken. That is only 
possible if the observation of objects works without destroying or changing 
them (examples where that is impossible are tensile breaking tests, medical 
examinations of killed animals, felling of trees and harvesting of food). The dis- 
cussed method is called simple random sampling with replacement. 

If a population of N objects is given and n objects are selected, then it is 7 < N 
for sampling without replacement, while objects that can multiply occur in the 
sample and 1 > N is possible for sampling with replacement. 

A method that can sometimes be realised more easily is the systematic sam- 
pling with random start. It is applicable if the objects of the finite sampling set 
are numbered from 1 to N and the sequence is not related to the character con- 
sidered. If the quotient m = N/n is a natural number, a natural number i between 
1 and m is chosen at random, and the sample is collected from objects with 
numbersi,m+i,2m+i, ... ,(a—1)m +i. Detailed information about this case 
and the case where the quotient m is not natural can be found in Rasch et al. 
(2008) in Method (1/31/1210). 

The stratified sampling already mentioned is advantageous if the population 
of size N is decomposed in a content-relevant manner into s disjoint subpopu- 
lations of sizes N,,N>, ... ,N,. Of course, the population can sometimes be 
divided into such subpopulations following the levels of a supposed interfering 
factor. The subpopulations are denoted as strata. Drawing a sample of size x is 
to realise in such a population an unrestricted sampling procedure holds the 
danger that not all strata are considered in general or at least not in appropriate 
way. Therefore in this case a stratified random sampling procedure is favour- 
able. Then partial samples of size 1; are collected from the ith stratum (i = 1, 
2, ...,S) where pure random sampling procedures are used in each stratum. 
This leads to a random sampling procedure for the whole population if the 
numbers n,/n are chosen proportional to the numbers N;/N. 

While for the stratified random sampling objects are selected from each sub- 
set, for the multistage sampling, subsets or objects are selected at random at 
each stage as described below. Let the population consist of k disjoint subsets 
of size No, the primary units, in the two-stage case. Further, it is supposed that 
the character values in the single primary units differ only at random, so that 
objects need not to be selected from all primary units. If the wished sample size 
is N=Fr No with r<k, then, in the first step, r of the k given primary units are 
selected using a pure random sampling procedure. In the second step N19 objects 
(secondary units) are chosen from each primary unit again applying a pure ran- 


k N 
dom sampling. The number of possible samples is ( )( ). and each 
r No 
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Table 1.1 Possible samples using different sampling procedures. 


Sampling 


Simple random sampling 


Systematic sampling with random start 
k= 10 

Stratified random sampling 
k = 20, N; = 50, i = 1,..., 20 
Stratified random sampling 
k = 10, N; = 100, i = 1,..., 10 
Stratified random sampling 
k= 5, N; = 200,i =1,...,5 
Stratified random sampling 
k = 2, N, = 400, N2 = 600 
Two-stage sampling 

k = 20, No = 50, r= 4 
Two-stage sampling 

k = 20, No = 50,r=5 
Two-stage sampling 

k = 10, No = 100, r= 2 
Two-stage sampling 

k = 10, No = 100, r = 4 
Two-stage sampling 

k = 5, No = 200, r=2 


Two-stage sampling 


k = 2, No = 500, r=1 


Number K of possible samples 


1000 
K= > 10140 
100 


K=10 
K- ( 
. / 100 
K= 
10 
200 \ ]° 
( )| = 1.6135878-10!°5 


) = 5.4662414-108 


50 
= 6.1245939-10!” 
=7.3069131-10!” 


) = 4,5401105-10*° 


= 5.0929047-107° 


200 
. = 4.5385838-10% 
50 


object of the population has the same probability p = - ~ to reach the sample 
0 


corresponding to Definition 1.1. 


Example 1.1 A population has N = 1000 objects. A sample of size n = 100 
should be drawn without replacement of objects. Table 1.1 lists the number 


7 
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of possible samples using the discussed sampling methods. The probability of 
selection for each object is p = 0.1. 


1.2 Mathematical Models for Population and Sample 


In mathematical statistics notions are defined that are used as models (general- 
isations) for the corresponding empirical notions. The population, which cor- 
responds to a frequency distribution of the character values, is related to the 
model of probability distribution. The concrete sample selected by a random 
procedure is modelled by the realised (theoretical) random sampling. These 
model concepts are adequate, if the size N of the populations is very large com- 
pared with the size v of the sample. 


Definition 1.2. An 1-dimensional random variable 


a 
Y= (Vs Vor 9 Vin) N= 1 


with components 4; is said to be a random sample, if 


e All y; have the same distribution characterised by the distribution function 
F(y;, 0) = F(y, @) with the parameter (vector) 9 € QC R’ and 

e All y; are stochastically independent from each other, that is, it holds for the 
distribution function F(Y,0) of Y the factorisation 


F(Y,9)= |] F01@),0E€QCR?. 
y 
i=1 


The values Y= (yz, ya, .--5 Th ae of a random sample Y are called realisations. 
The set {Y} of all possible realisations of Y is called sample space. 


In this book the random variables are printed with bold characters, and the 
sample space {Y} belongs always to an n-dimensional Euclidian space, that is, 
{Y} CR". 

The function 


OF(Y, 6) 
iivibyaet OO=— gy? 
pP(Y,0), for discrete y 


for continuous y 


with the probability function p(Y, @) and the density function f[Y, 9) correspond- 
ingly is said to be for given Y as function of 6 the likelihood function (of the 
distribution). 
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Random sample can have two different meanings, namely: 


e Random sample as random variable Y corresponding to Definition 1.2 
e (Concrete) random sample as subclass of a population, which was selected by 
a random sample procedure. 


The realizations Y of a random sample Y we call a realized random sample. 

The random sample Y is the mathematical model of the simple random sam- 
ple procedure, where concrete random sample and realised random sample cor- 
respond to each other also in the symbolism. 

We describe in this book the ‘classical’ philosophy, where Yis distributed by the 
distribution function F(Y, 9) with the fixed (not random) parameter 0 € QC R’. 
Besides there is the philosophy of Bayes where a random @ is supposed, which is 
distributed a priori with a parameter y assumed to be known. In the empirical 
Bayesian method, the a priori distribution is estimated from the data collected. 


1.3 Sufficiency and Completeness 


A random variable involves certain information about the distribution and their 
parameters. Mainly for large n (say, 1 > 100), it is useful to condense the objects 
of a random sample in such a way that fewest possible new random variables 
contain as much as possible of this information. This vaguely formulated con- 
cept is to state more precisely stepwise up to the notion of minimal sufficient 
statistic. First, we repeat the definition of an exponential family. 

The distribution of a random variable y with parameter vector 0 = (01,3, ..., 
Oye belongs to a kK-parametric exponential family if its likelihood function can 
be written as 


£(9,8) =A (eda T0)-BO), 
where the following conditions hold: 


e 7; and B are real functions of 6 and B does not depend on y. 
e The function h(y) is non-negative and does not depend on 0. 


The exponential family is in canonical form with the so-called natural para- 
meters 7;, if their elements can be written as 


k 
fn) =h(pedar™EO-AM with n= (msm)? 


Let (Pg, Oe) be a family of distributions of random variables y with the dis- 
tribution function Fly, 9), 9 € Q. The realisations Y= (yy, ..., Yn) of the random 
sample 


T 
Y= (YpJorrIn) > 


9 
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where the components 4; are distributed as y itself lie in the sample space {Y}. 
According to Definition 1.2 the distribution function F(Y, @) of a random sample 
Y is just as F(y, @) uniquely determined. 


Definition 1.3 A measurable mapping M = M(Y) = [M,(¥), ..., MY) lursn 
of {Y} on a space {M}, which does not depend on 6 € Q, is called a statistic. 


Definition 1.4 A statistic M is said to be sufficient relative to a distribution 
family (Pp, OeQ) or relative to 0 €Q, respectively, if the conditional distribution 
of a random sample Y is independent of 6 for given M = M(Y) = M(Y). 


Example 1.2 Let the components of a random sample Y satisfy a two-point 
distribution with the values 1 and 0. Further, let be P(y; = 1) = p and P(y; = 0) 
=1-p,0<p<1. ThenM=M(Y)=)>)_ 19; is sufficient relative (correspond- 
ing) to 9€(0,1)=2. To show this, we have to prove that 
P(Y=Y | 7719; =M) is independent of p. Now it is 


P(Y =Y,M=M) 


PSY MN up 


,M =0,1,...,7. 


We know from probability theory that M = M(Y) = 5~7_,y; is binomially dis- 
tributed with the parameters m and p. Hence it follows that 


n 
P(M=M)= (j, ee) a= 0. n. 


Further we get with y; = 0 or y; = 1 and A(M) = {Y| M(Y) = M™ } the result 


PLY = ¥, M(Y) = M] =P(y) = 915--1 In = In Lain (Y) 


-T] (()}ena-n)'™ Jaane 


= pi (Lap) Laan (¥) 
=p (1p), 


1 
Consequently it is P(Y = Y|M) = ——~, and this is independent of p. 
n 
This way of proving sufficiency is rather laborious, but we can also apply it for 
continuous distributions as the next example will show. 
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Example 1.3 Let the components y; of a random sample Y of size n be 
distributed as N(y,1) with expected value and variance o” = 1. Then 
M = S~y; is sufficient relative to  € R' = Q. To show this we first remark that 
Y is distributed as N(we,, I,,). Then we apply the one-to-one transformation 


dy . ge 
Z-=AY= (a: = Ye ¥2- Ne InN) withA = ( a 
—€y-1 Ly-1 


where |A| = 1. We write Z = (21,Z2) =O l¥p ¥o-Iy Yn — 1) and recog- 
nise that 


cov(Z2,21) = cov(Z2,M) = cov((=en-1dn-1)¥,e, Y) =On-1. 


Considering the assumption of normal distribution, the variables M and Z, 
are stochastically independent. Consequently Z, but also Z, |M and Z |M are 
independent of y. Taking into account that the mapping Z = AY is biunique, 
also Y|M is independent of v. Hence M = 5~y; is sufficient relative to y € R’. 
With a sufficient M = 5~ y;and a real number c 4 0, then c M is also sufficient, 
that is, +°y, =¥ is sufficient. 

But sufficiency plays such a crucial part in mathematical statistics, and we 
need simpler methods for proving sufficiency and mainly for finding sufficient 
statistics. The following theorem is useful in this direction. 


Theorem 1.1 (Decomposition Theorem) 

Let a distribution family (P», eQ) of a random sample Y be given that is domi- 
nated by a finite measure v. The statistic M(Y) is sufficient relative to 0, if the 
Radon—Nikodym density fg of Py can be written corresponding to v as 


So(Y) = go|M(Y)|A(Y) (1.1) 
v — almost everywhere. Then the following holds: 


e The v — integrable function gy is non-negative and measurable. 
e h is non-negative and h(Y) = 0 is fulfilled only for a set of Pp — measure 0. 


The general proof came from Halmos and Savage (1949); it can be also found, 
for example, in Bahadur (1955) or Lehmann and Romano (2008). 

In the present book we work only with discrete and continuous probability 
distributions satisfying the assumptions of Theorem 1.1. A proof of the theorem 
for such distributions is given in Rasch (1995). We do not want to repeat it here. 

For discrete distributions Theorem 1.1 means that the probability function is 
of the form 


P(Y,@) =g|M(Y), OJh(Y). (1.2) 
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For continuous distributions the density function has the form 


F(Y, 0) =g[M(Y), OJA(Y). (1.3) 


Corollary 1.1 Ifthe distribution family (P*(@), 6eQ) of the random variable y is 
a k-parametric exponential family with the natural parameter 7 and the likeli- 
hood function 
* BS i M; -A 
L*(y,9) =f (yer M7 OAD) , (1.4) 


then, denoting the random sample Y= (91, y, ...; In)’, 


T 
M(Y) = (Spano Si (1.5) 


is sufficient relative to 0. 


Proof: It is 


aoe * oy Y OM (y;)-nA 
Leys) = [] at ier eso, (1.6) 


isl 
which is of the form (1.2) and (1.3), respectively, where h(Y) =[]}_ ,A* (v1) 
and 0 =n. 
Definition 1.5 Two likelihood functions, L, (Yj, 0) and L(Y, 0), are said to be 
equivalent, denoted by L, ~ Lo, if 

11(%, 0) = a(¥j, Yo) Lo(Yo, 0) (1.7) 


with a function a(Yj, Y2) that is independent of 6. 
Then it follows from Theorem 1.1 


Corollary 1.2 M(Y) is sufficient relative to 0 if and only if (iff) the likelihood 
function Ly,(M, 0) of M = M(Y) is equivalent to the likelihood function of a ran- 
dom sample Y. 


Proof: If M(Y) is sufficient relative to 0, then because of 
Lau (M, @) =a(Y) L(Y, 0), a(Y) > 0, (1.8) 


Ly(M, 6) has together with L(Y, 7) the form (1.1). Reversely, (1.8) implies that 
the conditional distribution of a random sample Y is for given M(Y) = M inde- 
pendent of 0. 
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Example 1.4 Let the components y; of a random sample Y= (1, yx ---,¥n)” 


be distributed as N(u, 1). Then it is 


1 1 T 
L Y; = = ee) (Y-Hew) = rer he. ee 1 (vi- Ne -50- HW) 
0 Fae) (ay 
(1.9) 
Since M(Y) =¥ is distributed as N(,+), we get 
ae 10-0) (1.10) 


Hence Ly(y,n) ~ L(Y,) holds, and y is sufficient relative to py. 
Generally we immediately obtain from Definition 1.4 the 


Corollary 1.3. Ifc >is areal number chosen independently of 0 and M(Y) is 
sufficient relative to 0, then c M(Y) is also sufficient relative to 0. 


1 1 
Hence, for example, by putting M = >> y;andc = 7 also SD = y is sufficient. 


The problem arises whether there exist under the statistics sufficient relative 
to the distribution family (P*(0), 0 € Q) such, which are minimal in a certain 
sense, containing as few as possible components. The following example shows 
that this problem is no pure invention. 


Example 1.5 Let (P*(6), @ € Q) be the family of N(u,o7)-normal distributions 
(o > 0). We consider the statistics 
M\(Y)=Y 
T 
M2(Y) = (910%) 


M3(Y I= (So Sa) , r=l,...,H-1 


i=rt+l1l 


M,(Y) = (> 


of a random sample Y of size , which are all sufficient relative to o”. This can 
easily be shown using Corollary 1.1 of Theorem 1.1 (decomposition theorem). 
The likelihood functions of M,(Y) and Yare identical (and therefore equivalent). 


2 
Since both the y; and the y? are independent and pe =x? are distributed as CS(1) 
oO 


(y’-distribution with 1 degree of freedom; see Appendix A: Symbolism), it fol- 
lows after the transformation y? = 07y?: 
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Ly (M2(¥),o°) ~L(Y,07) = eae (1.11) 


Analogously we proceed with M3(Y) and M,(Y). 


Obviously, M,(Y) is the most extensive compression of the components of a 
random sample Y, and therefore it is preferable compared with other statistics. 


Definition 1.6 A statistic M*(Y) sufficient relative to 0 is said to be minimal 
sufficient relative to 6 if it can be represented as a function of each other suf- 
ficient statistic M(Y). 

If we consider Example 1.5, there is 


M,(Y)=M2(Y)M,(Y) =e7 M2 =(11)Ms, r=1,....0-1. 


Hence, M,(Y) can be written as a function of each sufficient statistic of this 
example. This is not true for M,(Y), M2(Y) and M;(Y); they are not functions 
of M,(Y). M,(Y) is the only statistic of Example 1.5 that could be minimal suf- 
ficient relative to o*. We will see that it has indeed this property. But, how can we 
show minimal sufficiency? We recognise that the sample space can be decom- 
posed with the help of the statistic M(Y) in such a way into disjoint subsets that 
all Y for which M(Y) supplies the same value M belong to the same subset. Vice 
versa, a given decomposition defines a statistic. Now we present a decomposi- 
tion that is shown to generate a minimal sufficient statistic. 


Definition 1.7 Let Yo € {Y} be a fixed point in the sample space (a certain 
value of a realised random sample), which contains the realisations of a random 
sample Y with components from a family (P*(@), 6eQ) of probability distribu- 
tions. The likelihood function L(Y, @) generates by 


M(Yo) = {¥ : L(Y,0) ~ L(Yo,6)} (1.12) 


a subset in {Y}. If Yo runs through the whole sample space {Y}, then a certain 
decomposition is generated. This decomposition is called likelihood decompo- 
sition, and the corresponding statistic M;(Y) satisfying M,(Y) = const. for all 
Ye M(Yp) and each Yo is called likelihood statistic. 


Before we construct minimal sufficient statistics for some examples by this 
method, we state 


Theorem 1.2 The likelihood statistic M,(Y) is minimal sufficient relative to 0. 


Proof: Considering the likelihood statistic M;(Y), it holds 


M1(¥1) =M1(Y2) 
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for Y,, Yoe {Y} iff L(Y, 0) ~ L(Y, @) is fulfilled. Hence, L(Y, @) is a function of 
M,(Y) having the form 


L(Y, 0) =a(Y)g" (a(¥),8). (1.13) 


Therefore M,(Y) is sufficient relative to 6 taking Theorem 1.1 (decomposition 
theorem) into account. If M(Y) is any other statistic that is sufficient relative to 
6, if further for two points Y,, Y2 € {Y} the relation M(Y,) = M(Y%) is satisfied, 
and finally, if L(Y; 6) > 0 holds for i= 1,2, then again Theorem 1.1 supplies 

L(¥1,0) =h(N1 )g(M(%1),) = h(¥2)g(M(Y2),8) 
because of M(Y,) = M(Y>) and 
L(¥,0) 
h(¥2) © 


L(¥2,0) = h(Y2)g(M(Y2),0) or equivalently g(M(Y2),@) = 


Hence, we obtain 

h(%1) 
h(Y2) 
which means L(Y, 0) ~ L(Y, 8). But this is just the condition for M(Y1) = M(Y2). 


Consequently M;(Y) is a function of M(Y), independent of how M(Y) is chosen, 
that is, it is minimal sufficient. 


L(¥1,0) = 


L(¥2,0),h(Y2) > 0, 


We demonstrate the method giving two examples. 


Example 1.6 Let the components y; of a random sample Y fulfil a binomial 
distribution B(N, p), N fixed and 0 < p <1. We look for a statistic that is minimal 
sufficient relative to p. The likelihood function is 


n 


N -Y; 
L(Y,p) = II (Jora-n %y,=0,1,...,N. 


i=1 


For all Yo = (you, ---,Yon)” € {Y} with L(Y, p) > 9, it is 
y 


Therefore M(Yo) is also defined by M(Yo) = {¥ : 07-1 = Do7_ oi}, since just 
there L(Y, p) ~ L(Yo,p) holds. Hence M(Y) = 5>/_,y; is a minimal sufficient 
statistic. 
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Example 1.7 Let the components y; of a random sample Y = (y;, ya, «+ ¥n)" 
be gamma distributed. Then for y; > 0 we get the likelihood function 


L(Y,a,k ibs ag. ft q wee 
ai rope ™ IDs 
For all Yo = (You aoe Yon) € {Y} with L(Yo, a, k) > 0, it is 


n 


L(Y,a,k) _ wee 
L(Yo,a,k) I x Gf. 


For given a the product [J/_,y; is minimal sufficient relative to k. If k is 
known, then )°/_,y; is minimal sufficient relative to a. If both a and k are 
unknown parameters, then (]]7_,¥;>_;-1);) is minimal sufficient relative to 
(a, k). 


More generally the following statement holds: 


k-1 


Theorem 1.3 If (P*(0), 0eQ) is a k-parametric exponential family with likeli- 
hood function in canonical form 


L(y,) = ed-0Mi0)-Al py), 


where the dimension of the parameter space is k (i.e. the 7, ..., 7, are linearly 
independent), then 


wr (San. 


is minimal sufficient relative to — (9), 0 € Q). 


Proof: The sufficiency of M(Y) follows from Corollary 1.1 of the decomposition 
theorem (Theorem 1.1), and the minimal sufficiency follows from the fact that 
M(Y) is the likelihood statistic, because it is L(Y, 0) ~ L(Yo, @) if 


yn yim -Mj i (Noi) | = 0. 


Regarding the linear independence of ;, it is only the case if M(Y) = M(Yo) is 
fulfilled. 


Example 1.8 Let (P*(0), 0 € Q) the family of a two-dimensional normal distri- 


x H 
butions with the random variable ( ) the expectation p= ( ‘ and the 
y Hy 
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2 
covariance matrix & = ( ‘ ) . This is a four-parametric exponential family 
0. 
y 
with the natural parameters 


Hy Hy 1 i 
Wy= ota 2713 = 274 = 2 
oy 20% 2 oy 


and the ia 


If dim(Q) = 4, then 


T 
= (Span, S >My, SM, Sm) 
(21 ie i=1 i=1 


is minimal sufficient relative to (P*(0),@cQ). Assuming that 
(P™ (0), 0eQ) C(P*(0),0eQ) is the subfamily of (P*(0),0cQ) with 


= G, =o’, then dim(Q) = 3 follows, and M is not minimal sufficient relative 


to (P"" (0), 0€Q). 


The natural parameters of (P’ (0), @¢Q) are 


_ Hx H J 1 
M = 99 N2 = 99113 = Io" 
1 
Further we have A(7) = — 5 (#2 + 3), and the factors of the 7; are 


ee msl 


T 
i=1 i=1 i=1 


is minimal sufficient. 


17 
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As it will be shown in Chapter 6 for model II of analysis of variance, the result 
of Theorem 1.3 is suitable also in more sophisticated models to find minimal 
sufficient statistics. 

Completeness and bounded completeness are further important properties 
for the theory of estimation. We want to introduce both together by the follow- 
ing definition. 


Definition 1.8 A distribution family P = (Pg, 6eQ) with distribution function 
Fy, 8) ,@ € Q is said to be complete, if for each P-integrable function h(y) of the 
random variable y the condition 


E{h(y)] = [a0)aF0) =0 forallOeQ (1.14) 


implies the relation 
Po[h(y) = 0] = 1 foralldeQ. (1.15) 
If this is true only for bounded functions h(y), then P = (Pg, 0 € Q) is called 
bounded complete. 


We want to consider an example for a complete distribution family. 


Example 1.9 Let P be the family {P,}, p € (0,1) of binomial distributions with 
the probability function 


y=0,1,...,.n, v= = 

n 
Integrability of 4(y) means finiteness of (1- Pp) dy -0(7) ( Jy , and (1.14) 
y 
implies 
u n 
S~h(y) ( Jeo for all p € (0,1). 
y=0 y 


The left-hand side of the equation is a polynomial of uth degree in v, which 


n 
has at most 7 real zeros. To fulfil this equation for all v € R", the factor ( h(y) 
y 


n 
must vanish for y = 0, 1, ..., a, and because of ( >0, it follows 
y 
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Po[h(y) =0)] = 1 forall p € (0,1). 


Theorem 1.4 A k-parametric exponential family of the distribution of a suf- 
ficient statistic is complete under the assumptions of Theorem 1.3 (dim(Q) = k). 


The proof can be found in Lehmann and Romano (2008). 


Definition 1.9 Let arandom sample Y = (yj, y2, ...,¥,)" be given whose com- 
ponents satisfy a distribution from the family 


P* =(Po,0€Q). 
A statistic M(Y), whose distribution is independent of 0, is called an ancillary 
statistic. If P is the family of distributions induced by the statistic M(Y) in P* and 


if P is complete and M(Y) is sufficient relative to P*, then M(Y) is said to be com- 
plete sufficient. 


Example 1.10 Let P* be the family of normal distributions N(y, 1) with expec- 
tation w = @ and variance 1, that is, it holds Q = R’. This is a one-parametric 
exponential family with dim(Q) = 1, which is complete by Theorem 1.4. If 
Y=(y1,¥ +-,¥,)’ is a random sample with components from P*, then 


1 
M,(Y) =¥ is distributed as N(y, = Consequently the family of distributions 


P* is also complete. Because of Theorem 1.3, ¥ is minimal sufficient and 
therefore complete sufficient. The distribution family of CS( — 1)-distributions 
(y’-distributions with a - 1 degrees of freedom) induced by 
(n-1)Mo(Y) = cy? -ny? is independent of y. Hence s* = 1,37", (y;-y)” is 
an ancillary statistic relative to p = 0. 


We close this section with the following statement: 


Theorem 1.5 Let Ybea random sample with components from P = (Pp, 6 € Q) 
and let M,(Y) be bounded complete sufficient relative to P. Further, if M2(Y) isa 
statistic with a distribution independent of 6, then M,(Y) and M,(Y) are (sto- 
chastically) independent. 


Proof: Let {Yo} c {Y} be a subset of the sample space {Y}. Then M.(Y) maps {Y} 
onto {M} and {Yo} onto {Mo}. Since the distribution of M2(Y) is independent of 6, 
P[M2(Y)e{Mo}] is independent of 0. Moreover, observing the sufficiency of 
M,(Y) relative to 0, also P[M2(Y)e{Mo}|My(Y)] is independent of 0. We consider 
the statistic 


h(M,(Y)) = P[M2(¥) € {Mo }|Mi(¥)] - P[Ma(¥) € {Mo}] 
depending on M,(Y), such that analogously to (1.14) 
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Eo{h(Mi (Y))] = Eo|P[M2(¥) € {Mo }|Mi (¥) -P[M2(¥) € {Mo}]] =0 
follows for all 6 € Q. Since M,(Y) is bounded complete, 
P|M2(Y )e{Mo }|\Mi (Y)] - P[Ma(¥ )e{Mo }] = 0 


holds for all 0 € Q with probability 1, analogously to (1.15). But this means that 
M,(Y) and M,(Y) are independent. 


1.4 The Notion of Information in Statistics 


Concerning the heuristic introduction of sufficient statistics in Section 1.2, we 
emphasised that a statistic should exhaust the information of a sample to a large 
extent. Now we turn to the question what the information of a sample really 
means. The notion of information was introduced by R. A. Fisher in the field 
of statistics, and his definition is still today of great importance. We speak of 
the Fisher information in this connection. A further notion of information ori- 
ginates from Kullback and Leibler (1951), but we do not present this definition 
here. We restrict ourselves in this section at first to distribution families 


P=(Py,0€Q2),2C R' 
with real parameters 0. We denote the likelihood function (Y = y) of P by L(y, 8). 


Definition 1.10 Let y be distributed as 
P=(Po,0€Q),QCR. 


Further let the following assumption V1 be fulfilled: 
1) Q@ is an open interval. 


O 
2) For each y € {Y} and for each 0 € Q, the derivative 39E 0 0) exists and is finite. 


The set of points satisfying L(y, @)= 0 does not depend on 0. 
3) For each 0 € Q there exist an € > 0 and a positive Pg-integrable function k(y,0) 
such that for all 0) in an e-neighbourhood of @ the inequality 
L(y,0) -L(y,0 
(y,0) -L(y, 60) <k(y,0) 
A-4 


holds. 


7] 
4) The derivative 59k 0) is quadratic Pg-integrable, and it holds for all 6 « Q 


oct foineo.0) 
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Then the expectation 


P) 2 
I(@)=E Ad 1.16 
(0)= [3 nuo.o) } (1.16) 
is said to be the Fisher information of the distribution Pg and of the variable y, 
respectively. 


It follows from the third condition of V1 that Py-integration and differentia- 
tion by @ can be exchanged for L(y, 9), and because of 


we obtain 
O ra) ra) 0 
Ey 5 inL(.0)| = | © inb(y,0)L(y,0)dy = | L(y,0)dy= 1 =0 
Y 
for all 6 ¢ 2. Hence, we have 


I(8) = vary Indy Oe (1.17) 


Now let the second derivative of InL(y, 9) with respect to @ for all y and 6 exist, 
and let {,L(y,0)dPo be differentiable twice, where integration and double 
differentiation can be commuted. Then by considering 


P a, a: ay a 


Oe” (L(y,0)) ——-L(y,8) v9 ) 


o : r oP 
= © | InL(y,0)dPp = |. InL(y,0)dPo, 
=| yy ) a |= ( ) 8 


the relation 


follows and therefore also 


2 


1(0) = -E, 0 Fe lnt.e ) (1.18) 


|? 
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We present an example determining the Fisher information for both a dis- 
crete and a continuous distribution. 


Example 1.11 Let P be the family of binomial distributions for given and 
Q = (0,1). The likelihood function is 


L(y,p) = (S)ra-n"”. 


The assumption V1 is satisfied, namely, the square of 


after replacing y by random y has the finite expectation 


1(p) “14 2 neo.) | -¥y0(Z- 2)" ("Joram 


and this means 


I(p) = ,0<p<l. 


n 
p(1-p) 
Example 1.12 Let P be the family of normal distributions N(u, 0”) with 
known o”. It is Q=R', and the likelihood function has the form 


1 1 2 
L(y,) = ena IH) 
OOH) ee 
For these distributions assumption V1 is fulfilled, too. We obtain 
1 
au InL(y,#) = a) 
and 
1 
I(u) = z= (y-H) = —, var(y) S50, 


Now we show the additivity of the Fisher information. 


Theorem 1.6 Ifthe Fisher information (0) = [,(@) exists for a family P of prob- 
ability distributions with Q = R' and if Y=(y1,y>, ...,¥,)" is a random sample 
with components y; (i = |, ..., 7) all distributed as P, € P, then the Fisher infor- 
mation 1,,(0) of the distribution corresponding to Y is given by 


1,(0) =nh,(0). (1.19) 
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Proof: It follows from Definition 1.2 that the likelihood function L,(Y, 0) of a 
random sample Y is 


L4(¥,8) = []L010). 


Consequently we get 
InL,,(Y,0) = > InL(yi,8) 
i=1 


and 


n 


0 0 
aglnln(Y,9) = a 5p h(n). 


i= 


Observing (1.17) we finally arrive at 


1,(0) = vat 5 Ind,(¥.0)} = > var 5 Ind(y,6)} = nl,(0). 


i=1 


Theorem 1.7 Let M(Y) bea sufficient statistic with respect to the distribution 
Pye P,QCR' of the components of the random sample Y = (y, yo, «.. ; Yn). Let 
the distribution P, fulfil the condition V1 of Definition 1.10. Then the Fisher 
information 


Iu(0) -t 5 Inl(M0) | (1.20) 


of M = M(Y) exists where L,,(M, @) is the likelihood function of M and 
1, (0) =Iy(0). (1.21) 
Proof: Considering (1.2) and (1.3), respectively, we have 
L(Y.) = h(Y)g(M(¥),9) 
and therefore 


0 0 

= 0)=— M(Y),@ 

79 nly 0) 39 Ing (Y),0) 

since h (Y) is by assumption independent of 9. Taking Corollary 1.1 of 
Theorem 1.1 into account, the likelihood function Ly,(M, @) of M satisfies also 
condition V1 of Definition 1.10, and therefore J);(0) in (1.20) exists. Observing 
the equivalence of Ly(M,0) and L(Y, 0), the assertion follows because 


0 7] 
of InLy(M,6) = 30 Ing(M,@). 
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Consequently the Fisher information of a sufficient statistic is the Fisher 
information of the corresponding random sample. 
Now we consider parameters 0 ¢€ QC R’. 


Definition 1.11 Let y be distributed as Py ¢ P, QC R” = (0, ...,0,)". Let 
the conditions 2, 3 and 4 of V1 in Definition 1.10 be fulfilled for each component 
0; (i = 1, ..., p). Let Q be an open interval in R’. Further, assume that the expec- 


0 
tation of = InL(y,0)— InL(y,0) exists for all 6 and all i, j = 1, ..., p. Then the 


00; 


quadratic matrix 
1(0)=(h,(@)), bj=LewP 


of order p given by 


0 0 
1(8)=E{ 55, InL (9,0) 57, InLy,)} 
is said to be the (Fisher) information matrix with respect to Po. 


Example 1.13 Let the random variable y be distributed as N(u, 0°) where 
0 = (u,0’)' € Rx R* =Q. Then 


1 1 
InL(y,@) =—- Inv 2x Ino” at (y-p) 


holds, and the assumption of Definition 1.11 is fulfilled with 0, = «and 0, = 07 
Further we have 


0 ear 0 a ae 2 
Pre » @) = ~~ and + InL( HOS aot peg 0H) . 


1 
Because E[(y — y1)*] = var(y) = 0°, it follows [,1(0) = -;: Since the skewness 
oOo 
y, = 0 and since on — p) = 0, we get [12(A) = J5;(6) = 0. Further 


0 aes A ree | F 
a 2) Sa EY, H) aa” i 


Moreover, considering y2 = 0 and E[(y — )*] = 30%, it follows 


face) as rig Dilvt8 if 
mE) |5o7inl,9)! ¢= ala 3* al ~ aot 


and we obtain 
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: 0 
# 
I(@) = i 
20% 


On the other hand, if we put 6, = w and 02 = o, then we have 


* (y-w)?. 


-—-—+— 
oC C2 


While 1, 12 and /2; remain unchanged, we find now 


O 2 1 2 
In) 5 nc.) \_Ju-2a)-3 
and therefore 


1 


This example shows that the Fisher information is not invariant with respect 
to parameter transformations. Using the chain rule of differential calculus, the 
following general statement arises. 


Theorem 1.8 Let y = h (0) be a monotone in Q C R’, and with respect to @ 
differentiable function, let h map Q onto /7. Then with respect to y, the differ- 
entiable inverse function 0 = g(y) exists. Under the assumptions of Definition 
1.10, let (0) be the Fisher information of the distribution P, ¢ [7. Then the Fisher 
information I"(y) of the distribution P,, (ie. Py written with the transformed 
parameter) is 


r(v)=10) (Few)) (1.22) 


Considering Example 1.13 we set (for fixed ) 0 = 0°, yw = VO = 0 and 


do 1 
te 2y = 20. By Theorem 1.8 we get with I(o”) = — the information 
y o 


In Chapter 2 we need the following statement. 
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Theorem 1.9 (Inequality of Rao and Cramér) 

Let assumption V1 of Definition 1.10 hold for all components of the random sample 
Y possessing the likelihood function L(Y, 6). Let the set {Yo} = {Ye{Y}: L(Y @) = 0} 
of the points in the sample space satisfying L(Y, 0) = 0 do not depend on 0. Let 
Po € P = (Po, 0 € Q),QCR' be the distribution of the components, and let M(Y) 
be a statistic with expectation E[M(Y)] and variance var[M(Y)] mapping the sample 
space {Y} into Q. Then the inequality of Rao and Cramér 


(eum ) 


IO) (1.23) 


var[M(Y)] = 


is fulfilled. 
Proof: With the notation M(Y) = M, we get E = E[M - E(M)] = 0. Hence 


dE | (a dPo+ | (M-E(M)) SL(¥,0)dPy =0 


{Y} {Y} 


and 


= | dPo+ | (M-E(M)) © Ink(Y,0)aY =0 
{Y} {Y} 
hold, respectively. Then 


dE(M) 
do 


{ (M-E(M)) nse) | = 


follows. Taking Schwarz’s inequality into account, we arrive at 


{2001 frozen Sintra} 


in-any| e inc (¥.8)| 1 ' 


Considering (1.16) completes the proof. 
When choosing E(M) = 6, the inequality of Rao and Cramér takes 
the form 


<E 


var[M(Y)] 


IV 


(1.24) 
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Theorem 1.10 If y is distributed as a one-parametric exponential family and if 
(0) = 4 = E(M), then 


1 


EM) antMny (1.25) 


Proof: Since the assumption V1 of Definition 1.10 is fulfilled, (4) exists. 
Observing 


d d 
Ao) ae as 


and Theorem 1.8, we obtain 


A(n) 


var(M) = I*(n) =1(0@)[var(M)). 


Hence, the assertion is true. 


Considering Schwarz’s inequality for the second moments of statistics M(Y) 
with finite second moment and an arbitrary function h(Y,0) with existing second 
moment, then the inequality 

2 
cov’ |M, h(Y,0)] 
M) => ——_———— 
RO a0 
follows. 


Theorem 1.11 Let M(Y) be a statistic with expectation g(@) and existing 
second moment, and let h; = h,(Y, 6), j = 1, ..., r be functions with existing second 
moments. Then with the notations 


cj = cov(M(Y),h;),0% = cov(hj,hj),c7 = (C14. ¢, ) 


and & = (o,) , (|Z2| #0), the inequality 


var(M) >c’£~'c (1.26) 
is fulfilled. 
cl xle 
Proof: The assertion follows from i) <1. 
ry 


With the help of (1.26), the inequality (1.24) of Rao and Cramér can be 
generalised to the p-dimensional case. 


Theorem 1.12 Let the components of a random sample Y = (y}, yo, «.. , Yn)" be 
distributed as 


Poe P = (Po, 06Q), QC R?,O" = (04,...,Op) p> 1. 
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Let L(¥,@) be the likelihood function of Y. Further, let the assumptions of 
Definition 1.10 be fulfilled. Additionally, let the set of points in {Y} satisfying 
L(¥%@) = 0 do not depend on @. Finally, let M(Y) be a statistic, whose expectation 
E[M(Y)] = w(@) exists and is differentiable with respect to all 0;. Then the inequality 


var[M(Y)|>a‘I-1a 
holds, where I~’ is the inverse of [(9) and a is the vector of the derivatives of w(@) 
with respect to the 6;. 


Proof: As I(0) is positive definite and therefore J~' exists, the assertion follows 


with hy; = SU ,0) from (1.26) considering Definition 1.11. 
7 


1.5 Statistical Decision Theory 


First, we formulate the general statistical decision problem. Let us start from the 
assumption that there is a set of random variables {y,} , eR’ whose distribution 
Pg € P= (Po, 0 € Q) , dim {Q} = p is at least partly unknown. 

Here we restrict ourselves to the case that only statements about y = g (0) are 
demanded, where Q is mapped by g onto Z and dim(Z) = s. This set Z is called 
state space. 

The statistician has for statements about y a set {F} of decisions at his dis- 


posal. {EF} is called decision space. Let Y;, = (Ye r-em) for each fixed t; be 
a random sample of size n,. 

Let the set of results of an experiment for which a decision has to be made (i.e. 
to select from {E}) be with 


k k 
N= 2 A= (Yar ¥a)e] [{¥} = {Ven}, 


the realisation of a random variable A, = (Y;,,...,Yz,). Now let de D be a meas- 
urable mapping from {Y;, a} onto E, which relates each A; to a decision d(A,). 
Then d is called a decision function, and D is the set of admissible decision func- 
tions. Ax will depend on the distribution of Ax, the support S, = (t1,...,%) of the 
experiment and its allocation vector Yt, = (1,...,N,). We denote the concrete 
design by 


S bis s.sst 
KN of Ee k A 
Ny Ny, «00 Mk 


belonging to a set V of admissible designs. Additionally, let a loss function L be 
given as measurable mapping from E x Z x R’ into R” (its definition and there- 
fore that of m is a problem outside of mathematics), which means 


L|d(Ax),w.f(M)],d(Ag) e E,w eZ (1.27) 


with a non-negative real function f(M) where M = (d, Sx, Nx). 
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The function Z registers the loss occurring if d(A,) is chosen, and y is the value 
in the transformed parameter space, while f (M/) corresponds to the costs for 
realising M. The task of statistics consists in providing methods for selection 
of triples M = (d, Sx, 9x), which minimise a functional R(d, Sx, View f(M) ) 
of random loss called risk function R. We will denote by d either a decision 
function (for fixed 1) or a sequence of decision functions of the same structure, 
whose elements differ only with respect to the sample size n. We assume that 


R(d, Sk, Misys f(M) ) =F(d, Sy Ney) +f (Ses Me), (1.28) 


where f does not depend on d and d” is the decision function (sequence of 
decision functions) for which 


min (d, Sp, Ny ) =F(d*, Sy, Ney) (1.29) 
€ 


is satisfied. Then the risk R can be minimised in two steps. First, d* is determined 
in such a way that (1.29) is fulfilled, and second, (Gx", 9x") is chosen to fulfil 


RA SE Me wh (Se ME) Nie ae R(d*, Se, News f (Sx Me) ). 
S\ ey 


€ 
Ne 


Definition 1.12 A triple T* « V x Dis said to be locally R-optimal at the point 
Wo €Z relative to V x D if for all T ¢ V x D the inequality 


RIT" Wo.f(M")] < RT wot (M) | 
holds. If M* is for all wo € Z locally R-optimal, then M* is called global 
R-optimal. 
Example 1.14 Let k = 1 and y, =y be distributed as N(y, o’). Then 
M 


o- , ear a and A,=Y. Further, let yw = g(9) = we Rand 
oO 


d(Y) =, be a statistic with realisations in R’ = {E}. Finally, let D be the class 
of statistics with finite second moment and realisations in R’. Now we choose 
the loss function 


Lili f(T) =a(G =)" + ConNK,cy,C2,K >0, 


where K represents the costs of an experiment (a measurement). Besides we 
define as risk R the expected random loss 


R(fi.n,u,Kn) =E[c)(u-fi)” + cnK] = cynK +c, [var(f) + Ba)’ |, 
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where B(fi ) = E(fi ) -y. In the class D the choice ji =yo is together with n = 0 
locally R-optimal for the decision, and for (yo,7) the risk R is equal to 0. The 
class D can be restricted to exclude this unsatisfactory trivial case. We denote 
by Dg C D the subset in D with Bi ) =0. Then we obtain 


R(t, n, pw, Kn) = conK +c) var(ft), fie De 
in the form (1.28). We will see in Chapter 2 that var(f) becomes minimal 
for jt” =¥. 
ae 
Since for a random sample of 1 elements we have var(y) = —, the first step of 
n 
minimising R leads to 
; ves Gig 
mine) var (fi ) = n° 
and 
c 
R(fi.n, 4, Kn) =coKn + ao. 
If we derive the right-hand side of the equation with respect to n and put the 
derivative equal to 0, then we get n* =o, | = and this as well as y does not 
C2 


depend on y = yu. The convexity of the considered function shows that we have 
indeed found a (global) minimum. Hence, the R-optimal solution of the decision 
problem in E x Z (which is in Z global, but in Q only local because of the depend- 
ence on o) is given by 


M=zlant= C1 
SO OAT Ree" 
If we choose wy = g(9) = o” > 0, then we obtain E = R*, k= 1, A, = YandN=n. 
Let the loss function be 
L|d(Y),07,f (M)] =c1 (07 ar +c 9nK,c; > 0,K >0. 


If we take again 
R(d(¥Y),n,0?,Kn) =R=E(L) =c1E{ (o?-a(y))’} + eynK 


as risk function, it is of the form (1.28). If we restrict ourselves to de Dg by anal- 
ogous causes as in the previous case such that E[d(Y)] = 0” holds, then the first 
summand of R is minimal for 


2a ly 
n-1¢4 
i= 


d(Y)=s (¥;-9) 


1 


which will be seen in Chapter 2. 
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2 
Since = (n-1) is distributed as CS (m — 1) and has therefore the variance 
(oy 
2(n — 1), we find 
20% 
PN ss 
var (s) = =i 
The first step of optimisation supplies 


2 4 
R(s’,n,o°, Kn) = cy — +ceonk. 
n- 


The R-optimal n is given by 


2c 
n* =1+0°,/—. 
‘ Key 


The locally R-optimal solution of the decision problem is 


2c} 
M* =(s8°,n* =1+07,/—}. 
( 7 Keo 


We consider more detailed theory and further applications in the next 
chapters where the selection of minimal sample sizes is discussed. We want 
to assume that d has to be chosen for fixed Gx and 3, R-optimal relative to 


Sx 
a certain risk function. Concerning the optimal choice of & ) we refer to 
Ne 


Chapters 8 and Chapter 9 treating regression analysis. We write therefore with 
t from Definition 1.13 
R(d,y) = E{L|d(¥),y]} =r(d,2). (1.30) 


In Example 1.14 a restriction to a subset Dz C D was carried out to avoid triv- 
ial locally R-optimal decision functions d. Now two other general procedures 
are introduced to overcome such problems. 


Definition 1.13 Let @ be a random variable with realisations 6 ¢ Q and the 
probability distribution P,,re<. Let the expectation 


[Rawnan. =r(d,t) (1.31) 


of (1.30) exist relative to P,, which is called Bayesian risk relative to the a priori 
distribution P,. 


A decision function do(Y) that fulfils 
r(do,T) = inf[r(d,z)], 


is called Bayesian decision function relative to the a priori distribution P,. 
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Definition 1.14 A decision function dp € D is said to be minimax decision 
function if 


R(do,w) = mi iW). 1.32 
maxR(do,y) = min maxR(d,y) (1.32) 


Definition 1.15 Let d,, d)¢ D be two decision functions for a certain decision 
problem with the risk function R(d, y) where y = g (8), 0 € Q. Then d is said not 
to be worse than dp, if R(d,, w) < R(do, w) holds for all 6 € Q. The function d, is 
said to be better than the function d, if apart from R(d,, y) < R(do, w) for alle Q 
at least for one 6* € Q the strong inequality R(d,, w*) < R(d2, y*) holds where 
y=g(6"). A decision function d is called admissible in D if there is no decision 
function in D better than d. If a decision function is not admissible, it is called 
inadmissible. 


In this chapter it is not necessary to develop the decision theory further. In 
Chapter 2 we will consider the theory of point estimation, where d(Y) = S(Y) 
is the decision function. Regarding the theory of testing in Chapter 3, the prob- 
ability for the rejection of a null hypothesis and in the confidence interval esti- 
mation a domain in Q covering the value @ of the distribution Py with a given 
probability is the decision function d(Y). Selection rules and multiple compar- 
ison methods are further special cases of decision functions. 


1.6 Exercises 


1.1 For estimating the average income of the inhabitants of a city, the income 
of owners of each 20th private line in a telephone directory is determined. 
Is this sample a random sample with respect to the whole city population? 


1.2 Asset with elements 1, 2, 3 is considered. Selecting elements with replace- 
ment, there are 3° = 81 different samples of size n = 4. Write down all 
possible samples, calculate 7 and s” and present the frequency distribution 
of ¥ and s? graphically as a bar chart. (You may use a program package.) 


1.3. Prove that the statistic M(Y) is sufficient relative to 0, where Y= (y1, yo, ...; 
yn)’ = 1isarandom sample from a population with distribution Py, 0 € Q, 
by determining the conditional distribution of Y for given M(Y). 


a) M(Y) = ->/_,y; and Po is the Poisson distribution with the parameter 
PGEQCR’. 

b) M(Y) = (va); Yon) and P, is the uniform distribution in the interval 
(0,0 +1) withOe ACR. 

c) M(Y) = y and Po is the uniform distribution in the interval (0, @) with 
Oe Q=R'. 

d) M(Y) =5>¥_,y; and Po is the exponential distribution with the param- 
eter 0€ Q = R". 
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1.4 Let Y=(91, y2, ..., Yn)’ 5 n>=1bearandom sample from a population with 
the distribution Py, 0 ¢ Q. Determine a sufficient statistic with respect to 0 
using Corollary 1.1 of the decomposition theorem if Py, 0 € Q is the density 
function 


a) (y,0) = Oy’ 1 ,0<y<1;0€Q=R', 
b) Of the Weibull distribution 


St (0) = a(Oy)* ey >0,0€Q=R* ,a>Oknown 


c) Of the Pareto distribution 


6a® rm 
f(y) = pp RE known 


1.5 Determine a minimal sufficient statistic M(Y) for the parameter 0, if 
Y=(y1,¥2 + ¥n)',N=1 is a random sample from a population with 
the following distribution Py: 


a) Geometric distribution with the probability function 


p(y,p)=p(1-p)*, y=1,2,..,0<p<1 
b) Hypergeometric distribution with the probability function 


M\ (N-M 
y n-y : 
p(y. M,N,n) = ne{l,...N},y € {0,...N};M<N integer, 


(") 


c) Negative binomial distribution with the probability function 


-1 
P(pr) = (? ie -p) ',0<p<1,y=rinteger,r € {0,1,...} 
r- 


and 
i) 8 = p andr known; 
ii) 07 = (p,7), 


d) Beta distribution with the density function 


1 
(9,0) = Bab? o-9""0 <y<1,0<a,b<o 


and 
i) 0 = a and b known; 


ii) 0 = b anda known 
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1.6 


1.7 


1.8 


1.9 


1.10 


Prove that the following distribution families {P», 0 ¢ Q } are complete: 


a) Py is the Poisson distribution with the parameter 6 « Q = R*. 
b) Pg is the uniform distribution in the interval (0, 0), 0 ¢ Q=R’*. 


Let Y= (y1, yo) -..,Jn)’,21 be a random sample, whose components 
are uniformly distributed in the interval (0, 6), Q=R*. Show that 
M(Y) = ¥) is complete sufficient. 


Let the variable y have the discrete distribution P, with the probability 
function 
@ fory=-1 


0 =Ply= > 
Ply ) (y y) oe for y=0,1,2,... 


Show that the corresponding distribution family with 0 € (0, 1) is 
bounded complete, but not complete. 


Let a one-parametric exponential family with the density or probability 
function 


f(y,0) =h(y) e9)M)-BO) G6Q 
be given. 


a) Express the Fisher information of this distribution by using the func- 
tions 7(0) and B(6). 

b) Use the result of a) to calculate J(9) for the 

i) Binomial distribution with the parameter 0 = p 

ii) Poisson distribution with the parameter 0 = A 

ii) Exponential distribution with the parameter 0 

Normal distribution N(w, 6”) with 6 = o and y fixed 


iii) 
iv) 


lv 


Let the assumptions of Definition 1.11 be fulfilled. Besides, the second 
2 


2 L 
00,00; 
{Y} as well as their expectations for random y. Moreover, let [ L(y, 4) 
dy be twice differentiable, where integration and differentiation are 
commutative. 


Prove that in this case the elements of the information matrix in 
Definition 1.11 have the form 


partial derivatives (y,@) are to exist for all i,j = 1,...,p andye 


00; “” 06 
oe 
b) 1,;(0) = “8 a6 Indiv), 


respectively. 
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1.11 Let Y=(y1, 72, ...,¥,)" be a random sample from a population with the 
distribution Py, € Q and M(Y) a given statistic. 
Calculate E[M(Y)], var[M(Y)], the Fisher information J(0) of the distri- 
bution and the Rao—Cramér bound for var[M(Y)]. 
Does equality hold in the inequality of Rao and Cramér under the fol- 
lowing assumptions? 


a) Pg is the Poisson distribution with the parameter 0 € R* and 


M(y) 1 fory=0 
~)0~ else 


(here we have 1 = 1, ie. y = Y). 
b) Py is the Poisson distribution with the parameter 9 ¢ R* and 


keg 
M(Y) = (1- ;) , (generalisation of a) for the case n > 1). 
n 
c) Po is the distribution with the density function 


f(y, 9) = Oy? 1,0<y<1,0€R* 


lou 
and M(Y)= -—) 7_, Iny;. 


1.12 Ina certain region it is intended to drill for oil. The owner of drilling 
rights has to decide between strategies from {EF,, Es, £3 }. 
The following notations are introduced: 


E, — The drilling is carried out under its own direction. 
E, — The drilling rights are sold. 
E; — A part of drilling rights are alienated. 


It is not known so far if there really is an oil deposit in the region. 
Further let be 2 = {9;, 93} with the following meanings: 


@ = 0, — Oil occurs in the region. 
@ = 9, — Oil does not occur in the region. 


The loss function L(d, 0) has for the decisions d= E;,i=1,2,3 and 
0 = 0, j= 1,2 the form 


E, E, E3 


O, ) 10 5 


05 12 1 6 


The decision is made considering expert’s reports related to the geo- 
logical situation in the region. We denote the result of the reports by 
ye {0, 1}. 


36 | Mathematical Statistics 


Let po(y) be the probability function of the random variable y — in 
dependence on 6 — with values 


y=0 ysl 


0, 0.3 0.7 
0» 0.6 0.4 


Therefore the variable y states the information obtained by the ‘random 
experiment’ of geological reports about existing (y = 1) or missing (y = 0) 
deposits of oil in the region. Let the set D of decision functions d(y) 
contain all possible 3* discrete functions: 


1 2 3 4 5 6 7 8 9 


dO) x, Ey Ey Ey Ey Ey E3 E3 E3 
d1) Ey E, E3 Ey E, E3 Ey E, E3 


a) Determine the risk R(d(y), 0)= Eg[L{d(y)}, 0] for all 18 above given 
cases. 

b) Determine the minimax decision function. 

c) Following the opinion of experts in the field of drilling technology, the 
probability of finding oil after drilling in this region is approximately 
0.2. Then @ can be considered as random variable with the probability 
function 


0 0, 2 
(0) 0.2 0.8 


Determine for each decision function the Bayesian risk r(d; 2) and then 
the Bayesian decision function. 


The strategies of treatment using two different drugs M, and Mz are to be 
assessed. Three strategies are at the disposal of doctors: 


E, — Treatment with the drug M, increasing blood pressure 
E, — Treatment without using drugs 
E3 — Treatment with the drug M> decreasing blood pressure 


Let the variable @ characterise the (suitable transformed) blood pressure 
of a patient such that 6 < 0 indicates too low blood pressure, 0 = 0 normal 


Basic Ideas of Mathematical Statistics 


blood pressure and 6 > 0 too high blood pressure. The loss function is 
defined as follows: 


E, E> E; 
0<0 0 c b+e 
0=0 b 0 b 
O>0 bee c 0 


The blood pressure of a patient is measured. Let the measurement y be 
distributed as N(@, 1) and let it happen v-times independently from each 
other: Y = (yi, y2, «.-, Yn)’. Based on this sample the decision function 


Ei, ify<r 
d,,s= « Ey, ifr<y<s 
Es, ify>s 
is defined. 


a) Determine the risk R(d,,.(y), 0) = E{L|d,.s(y),]}. 
b) Sketch the risk function in the case b = c = 1, n = 1 for 
i) r=-s = -1; 


fs 1 
ii) r=-~s=-l. 
2 


For which values of 6 the decision function d_; , :(y) should be preferred 
to the function d_; 2(y)? 
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Point Estimation 


In this chapter we consider so-called point estimations. The problem can be 
described as follows. Let the distribution Py of a random variable y belong to 
a family P = (Py, 0 € Q), QC R’,p>1. With the help of a realisation Y of a ran- 
dom sample Y = (91, y2, «-., Yn)’, = 1,a statement is to be given concerning the 
value of a prescribed real function y = g(A) « Z. Often g(@) = 6. Obviously the 
statement about g(@) should be as precise as possible. What this really does 
mean depends on the choice of the loss function defined in Section 1.5. We 
define a statistic M(Y) taking the value M(Y) for Y = Y where M(Y) is called 
the estimate of y = g(0). 

The notation ‘point estimation’ reflects the fact that each realisation M(Y) of 
M(Y) defines a point in the space Z of possible values of g(0). 

The problem of interval estimators is discussed in Chapter 3 following the 
theory of testing. 

By L[g(6), M(Y)] = L(y, M), we denote a loss function taking the value L(wo, M) 
if w takes the value wo and Y the value Y (ie. M = M(Y) takes the value 
M = M(¥)). 

Although many statements in this chapter can be generalised to arbitrary con- 
vex loss functions, we want to use mainly the most convenient loss function, the 
quadratic loss function without costs. If it is not explicitly stated in another way, 
our loss function 


L(y.M) = |lw-M|?,yeZ,Me D (2.1) 
is the square of the L>-norm of the vector y — M supposing that it is Po-integra- 
ble. Then we define the risk function as expectation 

2 
R(ysM) =E(\ly-M) = | lv -M(Y)|IPaPo (2.2) 
{Y} 


of the random loss. Here R(y, M) is the risk (the expected or mean loss) occur- 
ring if the statistic M (Y) e Dis used to estimate y = g(9) € Z. We will come back 
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later to the problem of finding a suitable set D of statistics. First we want to 
assure by the following definition that the difference y - M makes sense. 


Definition 2.1 Let Y= (yj, yo, ..., Yn) be a random sample of size 1 > 1 with 
components y; whose distribution Py is from the family P = (Pg, 6 € Q). A statistic 
is said to be an estimator (in the stronger sense) or also estimation S = S(Y) with 
respect to the real function g(0) = yw with y e Z = g9(@), if S maps the sample 
space into a subset of Z. By D we denote the set of all estimators with respect 
to g(9) based on samples of size n. 


Two remarks should be made concerning Definition 2.1. 

First, if we look for optional estimators, we always suppose that x is fixed and 
not itself a variable of the optimisation problem. Therefore we assume that both 
nand Se Dcan be chosen separately optimal considering the total optimisation 
process according to Section 1.5. Hence, if we speak about ‘the estimator’, we 
mean the estimator for a fixed n. For example, the arithmetic mean 


y= “yy; 


is an estimator for each n. But we want to give statements about the asymptotic 
behaviour, for example, referring to 1 in the case of the arithmetic mean. Then we 
consider the sequence {S(Y,,)} of estimators S(Y,,) with n = 1,2, ..., for example, the 


1 
sequence {y = -\~ ra of the arithmetic means. For short we keep to the 
Pip ae, = 


common speech that ‘the arithmetic mean is consistent’ instead of the more pre- 
cise expression that ‘the sequence of the arithmetic means is consistent’. 

Second, demanding that S is only an estimator, if S maps the space {Y} 
measurably into a space {M(Y)} C Z, is sometimes too restrictive. In older 
publications also such statistics, M are admitted as estimators if ZC {M(Y)}, 
dim(Z) = dim({M(Y)}) is fulfilled. Often such cases occurred in model II of anal- 
ysis of variance (ANOVA) (Chapter 6). Variance components estimated by the 
ANOVA method can also take negative values. In this book we will call such 
procedures not as estimators and remain with Definition 2.1. 

In non-linear regression we also speak of estimators if the corresponding 
mapping is not measurable. We call such statistics estimators in a weaker sense. 

It could be suggested to declare the aim of the theory of estimation as finding 
such estimators that are R(y, S)-optimal (i.e. that minimise the value of R(y, S) 
under all S(Y) € D). But, since R(y, S) is minimal, namely, equal to 0 for y = wo, if 
we put S(Y) = wo for all Ye{Y}, a problem stated in this way has no solution, 
which is uniformly R-optimal (i.e. for all y € Z). This dilemma can be eliminated, 
as already described in Section 1.5, either by restricting to a subset Dp C D and 
looking for R-optimal estimators in this subset Do or, analogously to use the 
Bayesian approach, by minimising a weighted risk, the so-called Bayesian risk 
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Ral) = | RerS)AP; (2.3) 
Z 
with respect to a measure P, standardised to 1, where P, (A € K) is chosen as a 
weight function, which has an existing integral according to (2.3) and which 
moreover measures the ‘importance’ of single 6-values. For random @ the weight 
P, is the probability measure of the random variable g(), that is, the a priori 
distribution of y = g(0). 
Finally there is a third approach that is often used. Here we look for a minimax 
estimator S(Y) satisfying 


R(w,S) = un a R(y,S). (2.4) 


We use the first approach in this book, as already indicated in Section 1.5. We 
consider in Section 2.1 the subset Di, =Do C D of unbiased estimators; further 
we restrict ourselves to linear (D,), linear unbiased (Dz z), quadratic (Dg) or 
quadratic unbiased (Dg,) estimators. 


2.1 Optimal Unbiased Estimators 


We suppose that all estimators S used in this chapter are Pg-integrable, which 
means that for each Pg € P = (Po, 0 € Q) and for each S the expectation 
EIs(n)]= | s(ryaro(y) (2.5) 
{Y} 


exists. Here Fo(Y) is the distribution function of the random sample Y = (y;, 
Yo -sIn) (and therefore the distribution function of the product measure 
of the distributions Pg belonging to ¥;). 


Definition 2.2 An estimator S(Y) based on a random sample Y = (yj, ya, ...; 
Yn) of size 1 = 1 is said to be unbiased with respect to y = g(6) if 

E|S(Y)] =g() (2.6) 
holds for all 6 ¢ 2. We denote the class of unbiased estimators of an estimation 
problem by D. The difference v,(@) = E[S(Y)] — g(@) is called the bias of S(Y). 

A statistic L(Y) is said to be unbiased with respect to 0 if 

E[u(Y)] =0 (2.7) 

for all 0€ Q. 
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Naturally the expression ‘for all 0 € Q’ in the definitions and theorems always 
means ‘for all Pp € P’, that is, more precisely for all measures Pg with existing 
integral in (2.5). First we show by an example that there are problems of esti- 
mation with a non-empty class D/,. 

Example 2.1 Let the components y; of the random sample Y = (y;, ya, -.-,¥n)" 
be distributed as N(y,07). Then @ = (uw, 07)7. Let yy = £1(9) = (10) 79 = pandy> = 
(0) = (0 1)'0 = 0°. We consider 


S,(¥)=¥ and Sx(¥) =) (94-9)? =, 


(n—-1)s? 


o2 


2 
We know that ¥ is distributed as N'{ pu, — and X? = as CS(n - 1). 


Consequently we have E(y) =y (for all 0) and E(s”) = 0% because E(X?) =n — 1. 
Hence, ¥ is unbiased with respect to y and s” is unbiased with respect to o°. 


However, there are problems of estimation possessing no unbiased estima- 
tors. This is shown in the next example. 


Example 2.2 Let the random variable Y = y be distributed as B(n, p) with0 < p< 1. 
Let 1 be known and y = g(p) = \/p. The sample space is {Y} = {0, 1, ..., }. 
Assuming that there is an unbiased estimator S(y) with respect to 1/p, the 
expectation 


EIS(y)]= (")pra-py"S0) 


y=0 


would be I/p. But this is not possible, because E[S(y)] tends to S(0) for p — 0 
while l/p tends to infinity for p — 0. 


The following statement is obvious. 


Theorem 2.1 If So(Y) is an unbiased estimator with respect to y = g(9), then 
each other unbiased estimator S(Y) with respect to y has the form 


S(Y) =So(Y)-U(Y) (2.8) 
where U(Y) is an unbiased statistic with respect to 0. 


We want to use this theorem to find R(y, S)-optimal estimators S(Y) € D. First 
we see that 


R(y,S) = var(S) 
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is true for S(Y) € Dj, that is, the variance-optimal unbiased estimator has to be 
found. Assuming that So is an unbiased estimator with respect to g and that So, S 
and UW have a finite variance, then 


var($) = var(So-U) = E[(So-U)’] -w’. (2.9) 


Hence, we can find the variance-optimal estimator by minimising E(Sp - UW). 
We want to demonstrate this approach in the next example. 


Example 2.3. Let Y = y where y take the values -1, 0, 1, ... with the 
probabilities 


P(y= -1) =p,P(y=y) =p"(1-p)’ fory=0,1,... 


where 0 < p=6< 1. Since 


a distribution is defined observing 


foe} 


p+ dp (1-p) =1. 


k=0 


If U(y) = —y U(-1) for y = 0, 1, ... and U(-1) € R’, then U(y) is unbiased with 
respect to 0. 


This can be seen from 


= 1 
eee = 5 for |x| <1 
k=1 (1-x) 


and 


E[U(y)]=U(-1) |p +0-(1-p)*» > 9p""| <0. 


However, E[U(y)] = 0 implies U(y) = - y U(-1) for y =0, 1, ..., namely, in 


ive} 


pUu(-1) + (1-p)'U(0) + (1-p)’p )_~U(y)p”* =0, 
yal 


the series converges for L(y) = y- const. Hence, the solution is U(0) = 0, U(y) = 
-y U(-1). 
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N 
a) 


Z 


Otherwise the series does not converge or converges in dependence on p. 
ow we consider two special cases: 


Let y = g(p) =p. Then, for example, So(y) with 


1 fory=-1 
sio)={ 


0 else 


is unbiased with respect to p, and S(y) in (2.8) is a variance-optimal unbiased 
estimator, since it minimises 


Q= 3+ Ply=y)[So(y) + 9-1) 


y=-1 


because of (2.9). Fixing p = po we get for Q 


Qo = Po [1-U(- p++ So pu(-1 p(1-po)’- 


y=-1 


By differentiating Qy for U(-1) and putting the derivative equal to 0, we get 
as variance-optimal value (the second derivative for U(-1) is positive) the 
minimum at 


that is, there is only one variance-optimal unbiased estimator dependent on 
the parameter value po. 

The situation is favourably disposed if we consider another function g(p). 
Let yw =g(p) = (1-p)’. Therefore we have to estimate (1 -—p)* (and not p 
itself ) unbiasedly. An unbiased estimator is, for example, So(y) with 


1 fory=0 
si) { 


0 else 


Naturally as unbiased estimation of 0, U(y) is the same for all functions g, 
and analogous to the case (a), we want to determine the minimum of 


Qo = polll(-1)]’ + (1-po)’1? + (1-po) x bU(-1)]’p- 


ysl 


Again the second derivative of Qy with respect to po is positive. Now the 
minimum is at U(-1) = 0. Consequently S(y) = So(y) is the variance-optimal 
unbiased estimator for (1 - p)” with respect to each po € (0, 1). 


We want especially to emphasise the property of the estimator in case (b) of 


Examples 2.3. 
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Definition 2.3 Let Y= (91,7, ...,¥,)’ be a random sample with components 
distributed as Py € P = (Pp, 0 € Q), and let § (Y) be an unbiased estimator with 
respect to g(9) = y with finite variance. Besides, let Dg C Dj be the class of all 
unbiased estimators with finite positive variance and Dj, the class of all unbiased 
estimators. If 


a Y = + S Y O Q, 2.10 
var[S(¥)] = min vary, [S(¥)}60¢ oy 


then S (Y) is said to be a locally variance-optimal unbiased estimator (LVUE) at 
0 = Oo. 


Definition 2.4 _ If (2.10) is satisfied for all 0 € Q, then S (Y) is said to be a uni- 
formly variance-optimal unbiased estimator (UVUE). 


The class Dz introduced in Definition 2.3 is used in the same sense also in the 
following. The next theorem contains a necessary and sufficient condition for an 
estimator to be a UVUE. 


Theorem 2.2 Let the components of the random sample Y= (y1, 72, ..., 
Yn) be distributed as Pg ¢ P = (Po, 0 € Q), and (let be) S(Y) ¢ De. Further, let 
Df. be the class of unbiased estimators with respect to 0 with finite second 
moment. Then the condition 


E[S(Y)U(Y)] = 0 forall U(Y) € DP andall6eQ (2.11) 
is necessary and sufficient for S(Y) to be a UVUE with respect to g(0). 


Proof: If S(Y) is a UVUE with respect to g(0), then S*(Y) = S(Y) + 1 U(Y) is unbi- 
ased with respect to g(0) for U(Y) € D°,0)¢Q and A ¢ R’. Moreover 


varg,(S*(Y)] = varg, [S(Y) + AU(Y¥)] = var[S* (Y)] for allAe R! 
is fulfilled. But then 
vara, [U(Y)] + 2acovo, [S(Y), U(Y)] = 0 forall ze R’ 
follows. Assuming equality, the quadratic equation in / has the two solutions: 


2covg, |[S(Y), U(Y)] 


41 = Oa = ~~ Vara, UY) 


But the expression on the left-hand side of the inequality is only non-negative 
for arbitrary 4 if the condition 


cova, |S(Y), U(Y)] = Eo, [S(Y)U(Y)] = 0 


is satisfied. This derivation is independent of the special parameter value @p. 
Therefore it is true everywhere in Q. 
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Reversely, assume that 
E|S(Y)u(Y)] =0 


is fulfilled for all L(Y) « D%. Besides, let S(Y) be another unbiased estimator 
with respect to g(0). If S(Y) is not in De, that is, if it is in D,, \ Dg, then trivially 
var[S(Y)] < var[S(Y) ] holds. Therefore let S'(Y) ¢ Dg. But then S(Y) - S(Y) € DE 
follows since the finite variances of S(Y) ¢« Dz and S (Y)e Dr imply also the finite 
variance of the difference S(Y) - S(Y) by considering 


var|S(Y)-S'(Y) ] = var[S(Y)]+ var[S'(Y) ]-2 cov[S(Y), S’(Y)]. 


Namely, with var[S(Y)] and var[S(Y) ],the right-hand side of the equation is 
finite such that the assertion S(Y) — S(Y) € D3, follows. Moreover, the assump- 
tion implies 


E{S(Y)[S(¥)-S'(Y) ]} = E{S[S—S']} =0 
and 
E(S*) =E(SS'), 
respectively. Now 
cov(S,’) = E{[S—g(6)][S’-g(0)]} = E(SS') -g(0)” = E(S”) -y? = var(S). 
Observing the inequality of Schwarz, we get 
[var(S)]° = cov(S,S’)” < var(S) var(S’) 
and therefore as asserted 
var(S) < var(S’). 


We want to demonstrate the consequences of this theorem by returning to 
Example 2.3. 


Example 2.3 (continuation) 

Our aim is to determine all UVUE of g(p). Since D?. contains only elements of 
the form U(y) = — y U(-1) and since (2.11) holds, it is necessary and sufficient to 
be an UVUE that under the assumption U(-1) 4 0 the equality E,(S(y)) = 0 is 
fulfilled for all p € (0, 1), that is, then S(y) belongs to D9 and satisfies therefore 
the relations L(y) = S(y) y= -y U(-1) =y S(-1). 

These relations hold if S(O) is an arbitrary real value and S(y) = S(-1) for 
y= 1,2,..... 
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If we put S(-1) =a, S(0) = b (a, b real), then we obtain 
E,(S) = pS(-1) + (1-p)’ S(0) + 5 \p’(1-p)’S(-1) 
y=l 


=pa+b(1-p) + (1-p)*aqo =b(1-p) +a[1-(1-p)" 


=a+(b-a)(1-p)’. 


Hence, g(p) must be of the form a + (b — a)(1 — p)”, if it is to possess an UVUE, 
but g(p) = p is not of this form. Therefore it is impossible to find a UVUE. 


The following statement is of fundamental significance for estimators belong- 
ing to Dre. 


Theorem 2.3 (Rao, 1945; Blackwell, 1947; Lehmann and Scheffé, 1950) 
Let the components of the random sample Y = (y1, y2, «.. Yn)” be distributed as 
Po€ P = (Po, 0 € @), and let S(Y) € Dz be unbiased with respect to g(9) = y. If there 
is a sufficient statistic M(Y) with respect to Pg, then the following exists: 


Wy (Y) =E(S(Y)|M(¥)] = A[M(¥)) (2.12) 
and is unbiased with respect to y and 
var[y (Y)] < var[S(Y)] forall@ €Q. 


If M(Y) is complete (and) minimal sufficient, then yw (Y) with probability 1 is 
the uniquely determined unbiased estimator of g(@) with minimal variance for 
each 0 € Q. 


Proof: Considering the sufficiency of M(Y), the expectation in (2.12) does not 
depend on @ and consequently is an estimator. Observing that S(Y) is unbiased, 
it follows via 


Ely (¥)]=E(EIS(Y)|M(Y)}} = EIS(Y)] = 
that y (Y) is unbiased, too. Further we get 
var[S(Y)] = E{ var[S(Y)|M(Y)]+ var{E[S(Y)|M(Y)]}}. 


The second summand on the right-hand side of the equation is equal to 
var[y (Y)], and the first summand is non-negative. This implies the second part 
of the assertion. 


Nowlet M(Y) be additionally complete andy (Y) = h[M(Y)]. Further, let M*(Y) 
be an arbitrary estimator from Dz dependent on M(Y) such that M*(Y) = 
t [M(Y)]. Then for all 6 ¢ Q the statement 


Ely (¥)| =£(M" (¥)] and E{h(M(¥)]-1{M(¥)]} =0, 
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respectively, holds. As M(Y) is complete, this implies / = t (with probability 1). 
This completes the proof. 

Under the assumption that P, is a k-parametric exponential family of full rank, 
it follows from Section 1.3 that it suffices to find an estimator Se Dz and a vector 


M(Y) = {M,(Y),..., Mk (Y)}.7 
Via (2.12) the UVUE w with probability 1 is unique. 


We want to demonstrate the applicability of this theorem by examples. 


Example 2.4 Let Y=(y1, 2, ...,¥,)’ be a random sample. 


a) Let the components of Ybe distributed as N(y, 1), that is, it is 0 = yw. If g(A) = w, 
then ye De. Since y is complete minimal sufficient with respect to the N(y, 1) 
family, y is with probability 1 the only UVUE with respect to p. 

b) Let the components of Y be distributed as N(0,o7). Then Say SQ, with 
respect to this family is complete minimal sufficient. It is 9 = 0” and we 


Fong MOP se tes 8 SQ... nA 
choose g(0) = 0~. As —; is distributed as CS(n), —— is with probability 1 
o n 
the only UVUE with respect to 0”. 


c) Let the components of Y be distributed as N(u, 6”). With 6 = ( :) we put 
oO 


(0) = ( A =0. Then H™ = (S77_ 1; )9/_ 197) is complete minimal suffi- 
o 


cient with respect to 0. The statistic 


i 7. 


M= > Sov 
i=1 
is equivalent to H’, meaning that H(Y,) = H(Y2) iff M(%) = M(Y,). This is 
clear if S~”_,(y,;- 9)" = 37", 9? —ny" is considered. Therefore M(Y) is also 
complete minimal sufficient with respect to @. As 4 i 107 yy is distrib- 
uted as CS(1 — 1), it follows that (J,s”) with s? = S~”_, (y,-7) with prob- 
ability 1 is the only UVUE with respect to (, 0”). 


Example 2.5 Let the components of a random sample Y= (1, yx, «.-,¥%,) be 
two-point distributed. Wl.o.g. we assume P(y = 0) =1-p,P(y=1)=p;0<p<1. 
The likelihood function 


L(Y,p) =p2P\(1=p)" 2a" 


shows that the distribution of Y belongs to a one-parametric exponential family 
and M(Y)=)-7_,y; is complete sufficient. Because of E(y;) = p, we have 
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M(Y) 


E[M(Y)] = np, and 7 
M = M(Y) is distributed as B(n, p) and assuming g(A) = var(y,) = p(1 — p), the 
(1-p)M because of 


=y is UVUE with respect to p = g(9). Observing that 


F Wyle 

estimator S(Y) a 
E(y¥-y’) = E(y)-E(9’) =p [var(y) + 2"] 

is unbiased with respect to p(1-yp) and therefore UVUE. Considering 


vit) pup) 


1 
| 
n-1 


and consequently 


n 
1-y)M| = 
(ley) | = 
S(Y) with probability 1 is the uniquely determined UVUE. 
Example 2.6 Let the components of the random sample Y = (1, y2, «-. , Yn)" be 
distributed as N(u, 0”), where y is known. Further, put g(6) = o’ (¢ = 1,2, ...). The 


n F ae ix 1 
estimator S(Y) = S~”_,(y;-)’ is complete minimal sufficient, and X? = a (Y) 


is CS(n)-distributed. The components and moments of X” are only dependent 
on n, that is, we have 


E(X”) =c(n, 2r) andE[S"(Y)] =07"c(n, 2r), 


respectively. Hence, |S (Y)E is UVUE with respect to o%. 


1 
c(t) 


The factor c(, 2r) is known from probability theory, which is 


c(n, 2r) = nae (2.13) 


For t= 1 andr = ¥%, respectively, the UVUE with respect to o is obtained from 
n nN 
S(Y) ¥ r(5) i i- Hy” 
c(n,1) var(*3*) 


(2.14) 


2 


For t = 2 and r = 1, respectively, the UVUE with respect to o” results from 


ast 520% Mu)’. 


c(n,2) nt 
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However, if « is unknown, then (y, (n—1)s*) is complete minimal sufficient 
with respect to 67 = (u, 0”) according to Example 2.4, and y is UVUE with 


(n-1)s? 


o2 


respect to yw. Since is distributed as CS(m — 1), the UVUE with respect 


to o* is obtained by 


2'T (> + r) 
E(s*’) aN Gh (2.15) 


= 2 Oo; 
On rv 
such that 
(>) n-1 
fe (2.16) 
var (5) 


is UVUE with respect to o. For r= 1 the estimator s” is UVUE with respect to 0”. 


Example 2.7 Let the component of a random sample Y= (y1, ya, «-. Yn)’ be 
-Ayk 
A 
distributed as P(A), 0 < 1 < oo. Now we want to estimate g(A) = ao (k = 0, 


1, 2, ...), a value of the probability function for a given k. An unbiased estimator 
based on the first element y, of a random sample Y with J ={Y, y, = k} is given by 


Si(Y) =g(y,)1(Y) (k =0,1,2....), 
e74yk 
Puke 


plete minimal sufficient with respect to 4, we can determine a UVUE S,(Y) 
according to 


So(¥) =E[Si(Y)|M(¥)] = Ply =k|M(¥)].- 


that is, S,(Y) is equal to for y; = kand 0 else. As M(Y) = )>y_ 19; is com- 


For all M = M(Y) the conditional distribution of y, for a given value of M is a 
binomial distribution with 1 = M and p = 1/n. This is not difficult to see. Since 
the y; are independent and identically distributed, for a fixed sum M, each y; 
takes the value a (a = 0, ..., M) with probability 


Point Estimation 


eis 

Spies) 

a n nN 

Therefore the UVUE is equal to 
0 forM(Y)=0,k>0 
1 forM(Y)=0,k=0 


M k n-k , 
(*) (=*) for M(Y) >0,k =0,....M 
nN 


Theorem 2.4 If a random sample Y= (y1, 2, ..., Jn)’ is N(u, X)-distributed 

with w= (My, ...5 ae ¢ RX, rk(S) = k and |X| >0, then the UVUE of the 

k(k +3) 
2 


-dimensional parameter vector 


e382 
0} (ese wae non) eer eoek 


based on a random sample X = (Yj, ..., Y,,)’ with components Y; distributed as 
Y is given by 


2 Pea Gs 
B= (FeV cs S ready). Pej = 2k 


where 
= n ‘5 n 2 
" nis male (5 7) : 
12 _ es 
Sa | ms (v5 ¥;) (Vix -Ix.)- 


Proof: If k = 2, then the family of two-dimensional normal distributions with 
positive definite covariance matrix is a five-parametric exponential family, and 


M2(X) > (91 5Vo »SS1,SS2,SP 12) 


is a complete minimal sufficient statistic with respect to this family, where 


He 2 
SS; = y (v5 -9,.) »t=1,2 
j=l 


and 


n 


SP). = s (ny -,,) (%y-¥.): 


j=l 
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The marginal distributions of y;,i=1,2 are N(us o;’)-distributed with the 

UVUE y, and 
S; = EB ge. 
* —n-1 

If we can show that E(SP, >) = (“7 — 1)o1 2, then the proof of the theorem is 
completed for k = 2, because the five parameters are then estimated by an unbi- 
ased estimator only depending on the sufficient statistic M2(X). 

But by definition it is now 


61,2 =El(¥, — M1) (Y2 —H2)| = E(12) — Ma Ha: 
Then, we have 


SP},2 = S> VV — AVN Ir. 
j= 


=1 


and 
E(SPi,2) = 2 (49%) -nE(¥, Jr) 
j= 
= (02 + Habla) ~ (12 + Hata) + (Lut) = 012 (02-1), 
since 


E (v2) = fyi 


taking into account that the y,,; and y2; are independent for i F j. 


k(k +3) 
2 


nential family with an analogously to the case k = 2 defined complete minimal 
sufficient statistic 


Now we consider the case k > 2 where X follows a -parametric expo- 


Mg(X) = (Fy s005 Io 8815-04 SSK SP 1250005 SPK-1,k) - 


ee) 


distributions unbiased and only depending on M,(X). This finishes the proof. 


parameters can be estimated from two-dimensional marginal 


Sometimes it is indicated to compare the variance of any estimator from Dz to 
the variance of the UVUE. This leads to a new definition. 


Definition 2.5 Let S(Y) and S(Y) be estimators from Dz with respect to g(0) 
and let So(Y) be a UVUE. Then the ratio 


var(So(Y)) 
var(S(Y)) 


is called the relative efficiency of S(Y). All UVUE are called efficient estimators. 


o= 


Point Estimation 


If there is no UVUE, then we often look for the best linear or the best quad- 
ratic estimators, meaning that a statistic has to be minimised in the class D; of 
linear, in the class D;¢ of linear unbiased, in the class Dg of quadratic or in the 
class Deg of quadratic unbiased estimators, respectively. The best linear or 
quadratic estimators and the best linear predictions are treated in the chapters 
about linear models. Linear estimators are used to estimate fixed effects in linear 
models. However, quadratic estimators are suitable to estimate variance com- 
ponents of random effects in linear models. 


2.2. Variance-Invariant Estimation 


In applications of statistics, measurements are carried out within a certain 
scale, which is sometimes chosen arbitrarily. In the biological testing for 
active agents, for instance, concentrations of solutions are registered directly 
using a logarithmic scale. Temperatures are given in degrees with respect to 
the Celsius, Fahrenheit, Réaumur or Kelvin scales. Angles are measured in 
degrees or radians. Assume now that two methods of measuring differ only 
by an additive constant c, such that y* = y; + c holds for the realisations of ran- 
dom samples Y* and Y, respectively. If components y; and y; of these random 
samples are distributed as Py. and Pg, respectively, and if 6* = 6 + c, then the 
relations 


S(¥*) =S(Y) +e 
and 
var[S(¥*)] = var[S(Y)] 


are fulfilled. The variances of the estimators are equal in both problems, and we 
say that the problem of estimation is variance-invariant with respect to 
translations. 


Definition 2.6 Let a random variable y be distributed as Py € P = (Po, 8 € Q) 
and take values y € {Y} in the sample space {Y}. Further, let 4 be a measurable 
one-to-one mapping of {Y} onto {Y} such that for each 0 € @ the distribution Pp». 
of h(y)=z is in P=(Py,8 € Q), too, where h(0) = 6° covers with @ the 
whole parameter space. Then we say that Pye P = (Po, 6 € Q) is invariant relative 
to h, where h is the mapping from Q onto itself induced by h. If {7} is a class of 
transformations such that Py « P=(Po,@ € Q ) is invariant relative to the 
whole class, and if H({T})=H is the set of all transformations, arising by 
taking all finite products of transformations in {7} and their inverses, then 
Po € P= (Po, 0 € @) is invariant relative to H, and H is the group induced by {7}. 
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W.l.o.g. we will therefore assume in the following that a class of transforma- 
tions is a group, where the operation is the product (or concatenation, the step- 
wise application of transformations). In the following let H be a group of one-to- 
one mappings of {7} onto itself. If P = (Pg, 0 € Q) is invariant relative to H, then 


we obtain 

Exlh(y)] =Exo (2.17) 
because 

Ealb(y)|= | HO)APo= | yaP iio = Exo bI 


Example 2.8 The family of normal distributions N(u,07); ot = (u, 0°) is 
invariant relative to the group of real affine transformations. Namely, if z = h(y) 
=a + by (a, beR’) and if y is distributed as N(y,0°), then it is known that z is 
distributed as N(u*,o**) with p* = a + by, 0°? =b’o”. 


Moreover, 6** = (u*,0*”) covers with 0" for fixed a and b the whole set Q. 


Definition 2.7 Let Y= (y1, yo, ...,,,)’ bea random sample with realised com- 
ponents y; € {Y} and let H be a group of transformations as described before. 
Further, let y; be distributed as Py € P = (Pg, 0 € Q) and assume that S is invariant 
relative to H. Then a statistic M(Y) = M(y1, ...,,) is said to be invariant relative 
to H if 


A\M(Y)] =M(p--In) (2.18) 
holds with h(M) = M[h(y1),.-.4(yn)| for all A € H. If M(Y) is an estimator and 
h[M(Y)] =h[M(Y)] (2.19) 
is fulfilled, then M(Y) is said to be equivariant (relative to H). 
The induced transformations / from Q onto Q introduced in Definition 2.6 
constitute a group H if / runs through the whole group H. 
Let the components of the random sample Y be distributed as N(y,0”), and let 
H be the group of real affine transformations introduced in Example 2.8. The 


(minimal sufficient and complete) estimator S’(Y) = (y.,8”) is equivariant, 
because, according to Example 2.8, we have 


h(0") =0°" = (a+ bu, b’o*) andh[S" (Y)] = (a+ by, b’s’). 


Point Estimation 


Definition 2.8 If g(9) = y € Z is to be estimated in a problem of estimation, if 
besides g(0,) = (02) implies g[h(Or : = g[h(62)| for all h € H and if finally 


lg (9) -AIS(Y)]||" = Ile) -S~@)IP 
holds for each estimator S(Y) ¢ De — respect to g(9) and for allhe H,0€Q, 
then the problem of estimation is said to be invariant relative to H (with respect 
to the quadratic loss). 


Theorem 2.5 If S(Y) is an equivariant estimator with finite variance in a 
problem of estimation, which is invariant relative to a group H of transforma- 
tions, then 

var) [S(Y)] = vare[S(Y)]. (2.20) 


Proof: Observing 


vary (S(X)]= | llet(@)] ~FSCY)I'4Pice = Fico { lel] -AISCHDIP } 
{Y} 


from (2.17) and the invariance of the estimation problem, the assertion 
follows that 


var.) [S(X)] = Eo{ |lg()-S(¥) ||" } = vare[S(Y)]. 


Corollary 2.1 Under the assumptions of Theorem 2.5, the variance of all with 
respect to H equivariant estimators in Q is constant (i.e. independent of 9), if the 
group H is transitive over Q. 


Proof: The transitivity of a transformation group H over 2 means that to any 
pair (0), 02) € Q, there exists a transformation heH, which transfers 0, into 65. 
Then it follows by Theorem 2.5 for each such pair 


varg, |S(Y)] = varg, [S(Y)] = const. 


Let P = (Po, 0eQ) be a group family of distributions, which is invariant relative to 
a group H of transformations for which H acts transitive over Q and additionally 


where g(0;) 4 g(02) always implies /[g(01)] 4 h{g(2)]. 


If the distribution of y is given by Pg, with arbitrary 0) e Q, then the set of 
distributions induced by h(y) with h e H is just the group family (Py, 0 € Q). 
Therefore a group family is invariant relative to the group of transformations 
defining this family. Hence, especially the location families are invariant relative 
to translations. 

Now we look for equivariant estimators with minimal mean square deviation. 
If Dg is the class of equivariant estimators with existing second moment and if 
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for R in (2.2) the mean square deviation MSD[S(Y)] is chosen, then an estimator 
So(Y) € D, satisfying 


MSD[S)(Y)| = _ inf. MSD[S(Y)] 


S(Y) eD, 


is called equivariant estimator with minimal MSD. 


Example 2.9 Let the components of a random sample Y be distributed as 
N(w,0°) where 07 = (u, 6”) € Q. As in Example 2.8 let H be the group of affine 


transformations. The statistic M™(Y) = (¥,SS,) with SSy=S~7_,(y;-7)” is 
minimal sufficient with respect to 0. Let Aly, SS,) be equivariant for gi(@) = 
u, and let 6*(y,SS,) be equivariant for g(0) = o”, that is, for all h « H and 
heH the relations 

h|ju(y,SSy)] =a + bju(y,SS,) = ja(a + by, b’SS,) =h |ju(y,SS,)| 
and analogously 


ia? (7,88,)] =0°6?(7,58,) 


are fulfilled. If we put a = -¥, b = 1, then h[M7(Y)] = (0,SS,) arises, and we can 
write all equivariant estimators with respect to pi as /i(y,SS,) =y + w(SS,) and 
all equivariant estimators with respect to o” as 6” (y,SS,) =aSS, with suitably 
chosen w and a. Since y and SS, are independent, 


MSD{y + w(SS,) | = var(y) + wE[SS,| 


holds. This becomes minimal for SS, = 0, such that y is the equivariant estimator 
with minimal MSD with respect to u. Nonetheless it is 


MSD[a8S,] =E| (#88, -o°)’] =0°E(SS,”) -2a07E(SS,) +0". 


SS 

Since = is distributed as CS(n-1), we get E(SS,) =(n- 1)o’, var(SS,) 
Oo 

=2(n-1)o* and consequently 


MSD[aSS,]| = 0? o*(n-1)(n + 1) -2a00*(n-1) + 0%. 
1 
This expression becomes minimal for « = aT Therefore 


ee SS, 
n+l 


is the equivariant estimator with minimal MSD with respect to 0”. 


Point Estimation 


2.3 Methods for Construction and Improvement 
of Estimators 


In Sections 2.1 and 2.2 we checked estimators whether they fulfilled certain 
optimality criteria. But first we need one or more estimators at our disposal 
for use in applications. In the following we want to consider methods for con- 
structing estimators. 


2.3.1. Maximum Likelihood Method 


We assume now that the likelihood function L(Y, 8) for all Ye {Y} has a uniquely 
determined supremum with respect to 0 € Q. The reader should be not confused 
by the double meaning of L(.,.), namely, both for the loss function and the like- 
lihood function — but this is common use in the statistical community. 


Definition 2.9 Fisher (1925) 
Let the components of the random sample Y be distributed as Pg € P= 
(Po, 0 € Q) and let L(¥Y,6) be the corresponding likelihood function. 


An estimator Sjj,(Y) is said to be the maximum likelihood estimator or 
shortly ML estimator (MLE) with respect to g(@) = y e Z, if its realisation is 
defined for each realisation Y of Y by 


LY g[Sui(¥)]} = maxL(Y,y)= maxL[Y,g(9)]. (2.21) 


It is obvious that equivalent likelihood functions imply the same set of MLE. 
Many standard distributions possess, as generally supposed in this section, 
exactly one MLE. Sometimes their determination causes considerable numer- 
ical problems. For exponential families the calculations can be simplified by 
looking for the supremum of In L(Y,) instead of L(Y,6). Since the logarithmic 
function is strictly monotone increasing, this implies the same extremal 0. 

If the distribution of the components of a random sample Y follows a 
k-parametric exponential family with a natural parameter 7, then 


k 
InL(Y,n) = > jM,(Y)-nA(n)+ Inh(Y) 
i=l 
with Mj(Y) = >07_.Mj(yi). 
For yw = g (0) =n with y = (m, ...,n,)’, the MLE of 7 is obtained by solving the 
simultaneous equations 


0 jee 
a = MIO =1, uk (2.22) 
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if A(7) is partially differentiable relative to n;. If the expectations of M,(Y) exist 
for a random sample Y, then 


ra] 
ha =E|M,(y;)| 
and 
F) ra) . 
£{ 7 Aisa = me J= lok 


follow. Moreover, if A(7) is twice partially differentiable relative to the coordi- 
nates of 7 and if the matrix 


7 A(n) 
Anon, 


of these partial derivatives is positive definite at 7 = Sj,,(Y) (for all 7 =y € Z), 
then (2.22) has the unique solution Sj,;;(Y), which is minimal sufficient. 


Example 2.10 Let Y= (91, y2, ...,¥,)’ be a random sample with components 
y; satisfying a two-point distribution, where y; (i = 1, ..., 2) takes the value 1 with 
the probability p and the value 0 with the probability 1 — p (Q = (0;1),0 = p). 


Then by putting y=5>;_,y; and g(p) = p, we get 
L(Y,p) = [[p"(-p)'™ =p (1-p)", y=0,1,..0. 
i=l 


This likelihood function is equivalent to the likelihood function of a random 
sample Y of size 1 distributed as B(n, p). By setting the derivative 


OInL(Y,p) _y n-y 
op po lap 


equal to 0, we get the solution p = 2, which supplies a maximum of L as the 


second derivative of In L relative to p is negative. Therefore the uniquely deter- 
mined ML estimator is 


Example 2.11 Let the components of a random sample Y be distributed as 
Nw, 0”) and let 6 = (u, 0°)" € Q. Then we obtain 


1 n 
In L(Y,0) = 5 Inde 5 ino” 53 1-H). 
i=1 


We consider two cases. 
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a) Let g(@) = 0. By partial differentiation we get 
aint Y,0) 1 


=O HM), 


ae n 1 2 
Oo? 7 sa TiO H) 


After putting both right-hand sides of these equations equal to 0, we arrive 
at the unique solution 


n 


is 
y, *3- 01-97 


i=1 


S(Y) = 


of this system. Since the matrix of second partial derivatives of In L is neg- 
ative definite at this point, we have the ML estimator 


n 


T 
12304-97| «(a 


i=1 


Sui(Y) = 


b) Let g(@) = (u,0)". By partial derivation we obtain instead of 
4 y 


AInL(Y,0) n lg 2 
do? = ag? * gt 2% HM) 


now as second equation 


AInL(Y,@) on 
do are YH) 


Setting again the partial derivatives to 0 and solving the system, we find the 
ML estimator 


y, yO1-9 = (ji,6)". 


tT 


Sia(Y)= 


Since the N(y,o7) distributions form an exponential family with the natural 
parameters 


2 apne 
ny oe’ M2 262’ 


the function In L(Y, @) can be written as 


InL(Y,0) = InL*(Y,n) = - ; In2a +My + 1M —nA(n), 
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where 


n n oe 1 
Mi=S0,_eMa= 5,97 and A() = 2 +5 in( : 


1 219 


If we partially differentiate InL*(Y, 7) relative to 7, and 7, then we get with 
Si (Y) = (71,7) after putting the partial derivatives to 0 and solving the system 
the expressions 


lin Min _ 1 | Mp 
29 n’ Ait)” iy n 


and finally the solutions 


7 A(n) 
On Ons 
(7,67), respectively, are minimal sufficient. 


Since the matrix ( ) is positive definite, the estimators (7,7) and 


Often numerical problems occur if the equations that have to be solved are non- 
linear or if even the function L(Y, 6) cannot be differentiated with respect to 0. 

As aconsequence of the decomposition theorem (Theorem 1.1) the following 
statement is given. 


Theorem 2.6 If the statistic M(Y) is sufficient with respect to Py under the 
conditions of Definition 2.9, then a ML estimator S,;;(Y) with respect to 6 only 
depends on M(Y). 


2.3.2 Least Squares Method 


If the form of a distribution function from the family P = (Pg, 6 € Q) is unknown 
or (as in the case of non-parametric families) not sufficiently specified, then the 
maximum likelihood method does not apply. Concerning the following method, 
we need a model for the components y; of the random sample. Then the prob- 
lem of estimation consists in the estimation of the model parameters. We now 
write for the components 


9, =E(y;) + ei =f (0) + (2.23) 


with an unknown real function fand with random ‘errors’ e;. Hence, we suppose 
that we know a parametric model f(@) for the expectations E(y,) of ¥;. So we 
have to estimate the model parameter @ and, if necessary, the distribution para- 
meters of e;. 


Point Estimation 
Since Y= (yj, yo, «5 yn)" is arandom sample, all e; have the same distribution 
and are (stochastically) independent, that is, e = (e, ..., e,,)” is a vector of iden- 


tically and independently distributed components. 

Besides we assume that E(e;) = 0. We restrict ourselves to estimate only 0 and 
var(e;) = 0°. The model class (2.23) originates from theory of errors. If an object 
is measured n times and if the measuring method includes errors, then the 
measured values y; differ by an experimental error e; from the real value yu 
(in this case we have f (9) = 1). 

The question arises how to get a statement about y by the 1 single measure- 
ments y;. Gauss, but earlier also Legendre, proposed the least squares method 
(LSM) that determines a value 0 € Q with minimal sum of squared 
errors )>;_,e?. 


Definition 2.10 A measurable statistic Sg(Y) whose realisation Sg(Y) fulfils 
the condition 


Yrs Sa(¥)]}’ = min) b1-F()} (2.24) 


is said to be estimator according to the least squares method with respect to 
8 € Q or shortly LSM estimator of 6 € Q. 


Usually the variance o” = var(e,) is estimated by 


n 2 
Pe ei [So(¥)] } (2.25) 
n— dim(Q) ; : 
if dim(Q) < n holds. 
The LSM estimator is mainly used in the theory of linear (and also non-linear) 
models. In these models Y is not a random sample, since the components of Y 
have different expectations. For example, for a simple linear model, we have 


y; =Bo + Bix; + e; (i= 1,...,0), 


and if 6, 4 0, the expectation E(y,) is dependent on x;. Hence, the vector 
Y=(y, ..., Jn)" is not a random sample. Nevertheless the parameters of the 
model can be estimated by the LSM. We refer to Chapters 4 and 8 where param- 
eter estimation in linear models is investigated. The LSM can be generalised also 
for dependent e; with arbitrary, but known positive definite covariance matrix. 


2.3.3. Minimum Chi-Squared Method 


The minimum chi-squared method (or minimum 7” method; this notation is 
used in the following) is applicable if the observed values are frequencies of 
observations, which belong to a finite number of mutually disjoint subsets 
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whose union represents the totality of possible realisations of a component of a 
random sample Y. It is unimportant whether these classes are possible realisa- 
tions of a discrete random variable (natural classes) or subsets of values of con- 
tinuous random variables. In each case let mj, ..., my, be the number of 
components of a random sample Y, which fall into k classes. Because of 


k 
ee” 


the variables n; are dependent. Let y, = g1(0), ... , wx =g;(0) be the correspond- 
ing probabilities, determined by the distribution Ps, for which an element of a 
random sample Y belongs to one of the k classes. 


Definition 2.11 An estimator So(Y) whose realisations fulfil 


xe te ngil. yy min ened ae (2.26) 
1 ngil. [So(Y 0 ng;(0 


i= 


is said to be minimum ,’ estimator. 


The notation minimum y?-estimator originates from the fact that X* is 
asymptotically distributed as CS(u — k). If the functions g,(@) are differentiable 
relative to 9, then a minimum of the convex function X” in (2.26) is obtained 
if the partial derivatives of X” relative to the components of 0 are put equal 
to 0 and the simultaneous equations are solved. This leads to 


53 {ni-ngilSoX)D} , (mi—ngilSo¥)]}? \ aei(9) 
&ilS0(¥)] an{gi[So(Y)]}° / 6: 


=0. (2.27) 
9=S0(Y) 


i=1 


Unfortunately it is difficult to solve (2.27). But often the second part in this 
sum can be neglected without severe consequences. In these cases (2.27) is 
replaced by the simpler equation 


=0. (2.28) 
9 =So(Y) 


Se ng; |S Y)]}4g;(@) 


i=l Sil SoC 00; 


This approach is also called modified minimum y” method. 


2.3.4 Method of Moments 


If just p product moments of the distribution Py € P = (Py, 8 € Q) , dim(Q) = p 
controlling the components of a random sample Y are known as explicit func- 
tions of 0, then the method of moments can be used. 
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Definition 2.12 If = p, then an estimator Sj;(Y) whose realisation S),,(Y) 
solves the simultaneous equations 


m,=H,[Su(¥)] (2.29) 


is said to be an estimator according to the method of moments. In (2.29) /’, is the 
usual rth moment. Observe that 


1 n 
m,=—) 09; 
isa 
Example 2.12 Let Ybearandom sample from a non-central CS(v,A) distribu- 
tion where v and A are assumed to be unknown. Then we get 
E(y;) =v +h, var(y;) = 2(v + 2A), var(y,) = E(y?) -[E(y,)]> i= Ln. 
For p = 2 relation (2.29) implies with r = 1 and r = 2 the system 
& 1< « a. 2 
y=vV +A and — 2 = 2(0+2d) 4+ (D+d), 
J=0 +i and 29? =20+28) + (04) 


with the solution S{, = (0,4) with 
12 i 7 
= 2y-= 5 ES 7) hes sev -¥. 
i=l 


2.3.5 Jackknife Estimators 


It is supposed for this method that an estimator S(Y) for the problem of estima- 
tion is already known. The aim is now to improve the given estimator. Here we 
restrict ourselves to such cases, where S(Y) is biased with respect to g(9) = y. We 
look for possibilities to reduce the bias v,,(@) = E[S(Y)] - g(0) . 


Definition 2.13 Let S,,(Y) be an estimator with respect to g(9) including all 


n (>1) elements of a random sample Y. Besides, let S,,_ (Y) be an element 
of the same sequence of estimators based on 


YO a (HNN wn) - 


n-1< F 
J(S(Y)] = #5 ,(¥)- TY 1S n-a( V4 ) (2.30) 


is said to be the jackknife estimator of first order with respect to g(@) based 


on S,(Y¥). 


[6 
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If S,(Y) and S,,_ (Y) have finite expectations and if the bias of S,,(Y) has 
the form 


then 


E(J|S(¥)|—g(8)} = n¥4(0) —(n—1)vn-1(0) = O_O) 


_ aA << 1 1 
agen 22) Fait 


lez 


follows such that the order of bias is reduced from O(+) to O(4). 
Example 2.13 Let Ybe a random sample whose components have the expec- 
tation yw. Further, let g() = w and S,,(Y) =y,,. Then the jackknife estimator based 
on ¥,, is given by J(¥,,) =Fn. 

Indeed, it is 


2.3.6 Estimators Based on Order Statistics 


The estimators of this subsection are to estimate mainly location parameters. 
First we want to introduce statistics that are important for certain problems 
of estimation (but also of testing). 


2.3.6.1 Order and Rank Statistics 
Definition 2.14 Let Ybe a random sample of size n > 1 from a certain distri- 
bution family. 

If we arrange the elements of the realisation Y according to their magnitude, 
and if we denote the jth element of this ordered set by yg such that yq)< ... < 
yn) holds, then 


is a function of the realisation of Y, and S*(Y) = YQ) = (vq), ---. i)’ is said to be 
the order statistic vector, the component yj) is called the ith order statistic, and 
y(n) — Vay = W is called the range of Y. 


Since Y € {Y} implies also Y() € {Y}, the sample space {Y} is mapped by S*(Y) 
into itself. 


Point Estimation 
Theorem 2.7 Let Y be a random sample with continuous components 


possessing the distribution function F(y) and the density function fly). Then 
the density function h(Y()) is given by 


= TO). (2.31) 


T 
Ifl<k<nandif R= (Ha) Hi) is the vector of a subset with k elements 
of Y(), then the density function h(R,) of Ry is given by 


nl k+l --1-1 k 
(Re) = Tonal FOw)-F G0) Il (5). 


a 


(2.32) 


where we put ip = 0, ix41 = K + 1, yi) =- 00 and yK41) = +00 and observe 
(i) Sos SV i). 


We sketch only the basic idea of the proof. Let B;, = ( mee i) and E the 


following event: considering the components of a random sample Y (and Y(), 
respectively) lie i; - 1 in B;,, ip- i, -1 in B;,..., k-i, in By 
If P;, is the probability for y « B;, then P; = Ja, F( y)dy = F(y (i )) -F (9) 
holds. Since 
Pit parte pee 
BSG Nien Dhak a 


we obtain (2.32) and for k = n also (2.31). 


Corollary 2.2 The density function of the ith order statistic is 


n!\ 


7 i-1 n-1 
h(a) = G-Dia-Hi Fo) D-Foa@))" Fe). (2.33) 
Especially significant are 


h(yay) =2[1-F(yay)]”” F (vay) (2.34) 


and 


SE (Nia (2.35) 
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Definition 2.15 Taking the notations of Definition 2.14 into account, let the 
n positive integers r; = r(yy) defined by y; = y,,. The numbers r; are called the 
rank numbers or simply ranks of y; (i = 1, ..., 2). The vector R= (rj, ..., In). 
[r(y1), ...» r(V)]” is called rank statistic vector of the random sample Y, and the 
components r(y;) are called rank statistics. 


2.3.6.2 L-Estimators 
L-estimators are weighted means of order statistics (where L stands for linear 
combination). 


Definition 2.16 If Yis a random sample and Y,, the corresponding order sta- 
tistic vector, then 


L(Y) =S:(¥)= Seas 20, )>¢=1 (2.36) 
i=l i=l 
is said to be an L-estimator. 


It has to be indicated with respect to which parameters L(Y) is to be an esti- 
mator. In the most cases we have to do with location parameters. The main 
causes for this are the conditions c; = 0, Bye ,¢i = 1. Linear combinations within 
order statistics without these restrictions can also be used to estimate other 
parameters, but often they are not called L-estimators. 

Thus with cy = -1, co = +++ = Cy_1 = 0, c, = 1, we get the range S(Y) = w = 


Y(n)— Ya) Which is an estimator with respect to o = \/ var(y) in distributions with 
existing second moment. 


Example 2.14 Trimmed mean value 


If we put 
Cy =. = Ce = Cy-t4] =e = Cy =Oand ey 41 =... =Cy_-p = : 
n-2t 
in (2.36) with t < > then the so-called trimmed mean 
1 n-t 
Lr(¥)=—> Do Hw (2.37) 


i=t+1 


arises. It is used if some measured values of the realised sample can be strongly 
influenced by observation errors (so-called outliers). For mn =2t + 1 the 


t 
—-trimmed mean is the sample median 
n 
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Lo (Y) =¥e41) =In-0)- (2.38) 


Example 2.15 Winsorized mean 
If we do not suppress, as in Example 2.14, the f smallest and the f¢ largest 
observations, but concentrate them in the value yy, 1) and y(, _ 9, respectively, 


t 
then we get the so-called — -winsorized mean 
n 


1 n-t 
Lw(¥)=— | S- Vi) + EY(e41) +05 (2.39) 
i=t+1 
1 
C] Foes = Oe= Cnt) = ee = On = Oand C441 = +. = Cn 2 = 


The median in samples even of size m = 2t¢ can be defined as %-winsorized 
mean 


1 
Lw(Y) =5 (eon +I-1)): (2.40) 


Definition 2.17 The median yneq of a random sample of size 1 > 2 is 
defined by 
Y(n-t) forn=2t+1 
Ymed = Med(Y) = 1 : 
2 (Yeon +In-1) for = 2t 


-1 
For m = 2t+1landt= 7 respectively, it is Med(Y) = L7(Y). 


2.3.6.3. M-Estimators 
Definition 2.18 An estimator S(Y) = M(Y), which minimises for each realisa- 
tion Y of a random sample Y the expression 


dle -S(¥)), (2.41) 
i=1 
where 
1 
af for |t] <k 
p(t) = : (2.42) 
k\t\- 5 for |t|>k 


holds for suitable chosen k, is said to be an M-estimator. 
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Huber (1964) introduced M-estimators for the case that the distributions of 
the components y; of a random sample Y have the form 


F(y) =(1-€) Gy) +eH(y), 


where 0 < € < 1 and Gand H are known distributions. If 0 < ¢ < 4, then F can be 
considered as distribution G contaminated by H. 


2.3.6.4 R-Estimators 
Definition 2.19 Let Y be a random sample and Y,) the corresponding order 
statistic vector of Y. For 1<j<k<n we denote 


1 
mx = 5 (91) +90))- 


Further, let d,, ..., d,, be m given non-negative numbers. The numbers 


define the probabilities of a }n(n + 1)-point distribution, that is, of a discrete dis- 
tribution with $n(n+1) possible values m,, (constituting the support), which 
occur with the positive probabilities w,. If R(Y) is the median of this distribu- 
tion, then R(Y) is said to be an R-estimator after transition to a random variable. 


It is easy to see that the values w,, define a probability distribution. Namely, 
these wj are non-negative and also not greater than 1, since the numerators 
in the defining term are not greater than the common denominator. Finally, 
considering the jn(n +1) pairs (j, k), the numerator d,, occurs n-times, the 
numerator d,,_; occurs (m — 1)-times and so on, up to the numerator d, that 
occurs once (viz. for the pair j = 1, k = n). 


Example 2.16 Hodges—Lehmann estimator 
Assume that d, = --- = d,, = 1. Then R(Y) is the median of the mj. This estimator 
is called Hodges—Lehmann estimator. 


2.4 Properties of Estimators 


If we construct R-optimal estimators as in Section 2.1, we know in the case of 
global R-optimality that the obtained estimator is the best one in the sense of R- 
optimisation. Sometimes it is interesting to know how these optimal estimators 
behave according to other criteria. But it is more important to validate estima- 
tors constructed by methods, which were described in Section 2.3. Is it possible 
to state properties of these estimators? 
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What can be done if R-optimal solutions do not exist as it was shown in Exam- 
ple 2.2? Are there estimators that have at least asymptotically (ie. for 1 — oo) 
certain desired properties? We will present some results for such problems. 


2.4.1 Small Samples 


The first question should be to define the meaning of ‘small’ in this connection. 
This verbal expression has become a technical term of statistics. The focus is 
then on samples of such a size, which needs exact methods and excludes the 
approximate use of asymptotic results. This holds mainly for samples of size 
n < 50. For larger samples, partly asymptotic results can be used supplying good 
approximation for sequences of samples with  — oo. It depends on the problem 
from which x on this is possible. We will see in Chapter 9 about non-linear 
regression that in special cases asymptotic results can be exploited already 
for n = 4. But this is the exception. Unfortunately in most cases, it is not known 
where the limit of applicability really lies. 

In this section we describe properties that hold for each v > 1. Such essential 
properties are to be unbiased (Definition 2.2) or to be in Q global variance opti- 
mal unbiased (Definition 2.3). If no local variance-optimal unbiased estimator 
exists, then the relative efficiency in Definition 2.5 can be extended to arbitrary 
estimators in Dr fulfilling condition V1 in Definition 1.10, and the variance in 
Definition 2.5 can be replaced by the lower bound given in the inequality of Rao 
and Cramer. 

All random samples and estimators of this section may be assumed to satisfy 
the assumption V1 of Definition 1.10. Let the components of a random sample Y 
be distributed as Py € P = (Pg, 8 € Q) with dim(Q) = 1. We start with the general- 
ised concept of relative efficiency. 


Definition 2.20 Let S,=S,(Y) and S,=S.,(Y) be two unbiased estimators 
based on the random sample Y with respect to g(9). Then 


var[S,(Y)| 


e($1,S2) = var[S2(Y)] 


(2.43) 
is said to be the relative efficiency of S, with respect to S;. For each unbiased 
estimator S = S(Y) with respect to g(6), the quotient 


(“ oy 
00 

Ss) = ——— 2.44 
(5) = 5 (6) var(S(¥)) an) 
is called efficiency function, where J,(9) denotes the Fisher information 


(see (1.16)). 
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The concepts of efficiency just introduced in (2.43) and (2.44) are not related 
to the existence of a UVUE; they need weaker assumptions as, for example, the 
existence of the second moments of Sj and S, in (2.43) or the assumptions of 
Theorem 1.8 with respect to g() in (2.44). The equation (2.44) measures the 
variance of all S(Y) ¢ Dz at the lower bound of the inequality of Rao and Cramér 
for dE[M(Y)] = dé. Sometimes it is interesting to compare estimators with dif- 
ferent bias according to the risk (2.2), which is based on the quadratic loss 
in (2.1). 


Definition 2.21 If S(Y) is an estimator with respect to g(9) = y with the bias 
V, =V,(0) according to Definition 2.2 and if the second moment of S(Y) 
exists, then 


MSD{S(Y)] = E{[w-S(¥)]?} = var[S(Y)] + (2.45) 
is said to be the mean square deviation of S(Y). For two estimators with existing 
second moments, the quotient 
_ MSD{Si(¥)] 
~ MSD{[S2(Y)| 


is called relative mean square deviation of S,(Y) with respect to S,(Y). 


r(S1,S2) (2.46) 


The following example shows that there exist estimators outside of Dg with a 
mean square deviation smaller than that of the UVUE. 


Example 2.17 _ Ifthe components of arandom sample Y = (1, 2, «-. Yn)’ (1 > 1) 
are distributed as N(u, 0”) and if g(0) = 6”, then s” is a UVUE with respect to o” 
(see Example 2.4c). The formula for the variance of the y” distribution implies 


2 A 
that var(s”) = ~~ holds. The maximum likelihood estimator 


-1 
5 = datherp 
n 
has the bias v,, (6”) = - a and the variance 
2 
~2 _ (n-1) 2 _ 2(n-1) 4 
var (6°) = D var (s”) = as 
Therefore it is 
2o* 2n-1 
MSD(s*) = var(s?) = — and MSD (6") = var(6*) +; (67) = . =o". 
n- n 
We get 
(62,82) = (2n-1)(m-1) _ 2n?-3n+1 ae 


2n 2n 
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That is, MSD(6”) is always smaller than MSD(s*). Therefore 6”, with respect 
to the risk function, R(y,S) is uniformly better in Q than s’ (and s” is not admis- 
sible in the sense of Definition 1.15). Nevertheless s” is used in applications with 
only a few exceptions. The equivariant estimator with minimal MSD is accord- 
ing to Example 2.9 


Sy eae 
= S 
n+l 
with the bias - ——o”. Consequently we obtain 


MSD(6”) = var(6") + 


and because of 1 > 1 
2n 2n 


<2 =2 
A = = 1 
eae) (2n-1)(n+1) 22 +n-1~ 


Among the three estimators, 6” has the largest bias, but the smallest MSD. 


Definition 2.22 If S\(Y) is an unbiased estimator of yw; = gi(A) and S2(Y) an 
unbiased estimator of y2 = go(0), then 


2 
eww, (S182) = “he \ varlS(¥)] 
WW (P1982) = d g,(0) var[So(Y)| 


do 


(2.47) 


is said to be Pitman efficiency of S2(Y) with respect to S,(Y) (Pitman 1979). Here 
the existence of the derivatives of g, and g. and of the second moments of the 
estimators is supposed. 


For g) = g» the efficiency (2.47) is reduced to (2.43). 


2.4.2 Asymptotic Properties 


Sometimes it is useful to investigate the limit behaviour of estimator sequences 
for noo. 

Briefly, an estimator possesses a certain asymptotic property, if the corre- 
sponding estimator sequence possesses this property. In each case we suppose 
a sequence Yj, Y>, ... of random samples Y= (y1, yx, -.-, Jn)’ with n = 1,2, .... 


Definition 2.23 Let S,, Sy, ... be a sequence {S,,} of estimators with respect to 
g(9), where S,, = S(Y,,). Then {S,,} is said to be consistent, if this sequence sto- 
chastically converges to g(9) for all 0 € Q, that is, if for all e > 0 
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lim P{||S,-—g(0)|| > e} =0 
A 0 


holds. Further, the sequence {S,,} is called asymptotically unbiased, if the 
sequence v,, (A) = E(S,,) — g(@) of bias tends to zero for all 6 € Q, that is, 


lim v, (0) =0. 


The concept of consistency is not really suitable to evaluate competing esti- 
mators. Thus the estimators s”, 6” and 6? of Section 2.4.1 are consistent (in the 
family of normal distributions) with respect to 6”; all three are also asymptot- 
ically unbiased. But we have 


MSD(6’) < MSD(6*) < MSD(s’). 


Definition 2.24 Let {S,, ,,} be a sequence of estimators with respect to g(@) and 
let \/n[S1,,—g(0)] be distributed asymptotically as N(0, o7). Additionally, let 
{S2,,} be another sequence of estimators with respect to g(@) so that 
Vn[S2,n-g(0)| is distributed asymptotically as N(0, 03). Then the quotient 


oF 
e4(S1,S2) = a (2.48) 
2 


is said to be the asymptotic relative efficiency of {S2,,,} with respect to {S, ,}. 
Here o? is called the asymptotic variance of {§;, ,} (i = 1,2). 


A general definition of the asymptotic relative efficiency of two sequences of 
estimators can be given also for the case that the limit distributions are no nor- 
mal distributions. 


Example 2.18 We consider the asymptotic relative efficiency of the sample 
median with respect to the arithmetic mean based on location families of dis- 
tributions Py. If F(y — @) is the distribution function and L(y,0) = fly) the density 
function of the components of the random sample Y = (1, yo, «.- , 7)", then 0 is 
for F(O) = 1/2 and f(0) > 0 the median of the distribution Py. Now let 


(m+) forn=2m+1 
$2, =, = 1 
3 Ym +Yms1)| forn=2m 


be the median of Y,,. 


1 
We show that ./n(y,,—9) is distributed asymptotically as N (0 Fu) . First 


let 1 = 2m + 1. Since the distribution of y,,-@ is independent of 0, 
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e a 3 c 
Po{V/n(¥,,-9) <c} =Po{ ny, <c} =Po4 9,5 pe. 
Jn 
holds for real c. If w,, is the number of realisations y; greater than 7 then y,, < ai 


-1 
is satisfied iff w,<m= _ Observing that w,, is distributed as B(n, p,,) with 


c 
Pr=1-F (= , the relation 
Fa) 


1 
7 ee fot - Wn — Pn soa 
Po{ ValGy 0) $0} =Po{ was 2 lp, VnPn(1=Pn)  V/nPn(1-Pn) 


is fulfilled. If we apply the inequality of Berry and Esseen [Berry (1941), Esseen 
(1944), see also Lehmann and Romano (2008)] (taking into account that the 
third moment exists for the binomial distribution), it follows that the difference 
1 
5 (n-1)-npy 


P{W, < és 
npn(1—pn) 


tends to 0 for nm —oo (@ distribution function of N(0, 1) distribution). It is 


1 1 1 
lim uw, = lim vi ( Pn) | . 
no no Pn(1-pn) 2 2/n 


For n —0o the sequence F (=) converges to F(0) = % and therefore p,,(1 — p,) 


5 bela = 


to —. Hence, 
4 


F( _ a 
1 
lim u,, =2 lim vi € -»)) =2c ae ve 


But the limit of the right-hand side of the equation is i the first derivative of 
F(y) at y = 0, that is, 


lim u,, = 2cf (0). 


Consequently Pt Ja (ym) -9) <c} tends to ®[2cf(0)] for noo. If y is 
distributed as N(0, 0”), then we get 


o(<) =P(2 <<) =P(y<c), 


oO O 
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and vice versa. P(y<c) = &(2) implies that y is distributed as N(0, 0”). Therefore 
1 
Jn (Hm -6) is distributed asymptotically as N (sam): It can be shown 


[see Lehmann and Romano (2008)] that this is also the case for even n, which 
finally means for arbitrary n. 
Now we consider on the other hand the arithmetical mean 


Sin= I=) Y 
i=; 
2 


oO 
It is well known that y is distributed with expectation 0 and variance —, which 
n 


means that \/n(y —0) has expectation 0 and variance o”. Hence the distribution 
of \/n(y -0) converges to the N(0,0”) distribution. By (2.48) we obtain 


ea(9,9) = 40°f*(0). 


1 
If y is distributed as N(y, 1), then f(0) = oe and 
7 


Pees 4 
ea(y,¥) = = 0.6366. 


Bahadur (1964) showed the following result under certain regularity condi- 
tions, which are omitted here. If \/n[S,(y)-6] is distributed asymptotically as 
N(0,o’) for estimators S,(y) with respect to 0, then 
(2.49) 


is true, where (0 


é 


denotes the Fisher information with respect to Po. 


Definition 2.25 Let S,(y) be an estimator with respect to 0 €« Q and let us 
assume that the Fisher information with respect to Pg exists. Further, let 
Vn|S,(y) —6] be asymptotically distributed as N(0, o7(@)). If the equality holds 
for o°(0) in (2.49), then S,,(y) is said to be a best asymptotically normally distrib- 
uted estimator or simply BAN estimator. 

Let 67 = (0), ..., @,) and let the information matrix J(@) defined in Section 1.4 
exist and be positive definite. Then a (vectorial) estimator S,, with respect to 0 is 
called BAN estimator, if \/7[S,, — 6] is asymptotically distributed as N(0,,, I-'(0) ]. 


The following theorem is given without proof. 


Theorem 2.8 Let L(y,0) be the likelihood function of the components in the 
sequence {Y,,} of random samples and assume that In L(y,6) has second partial 
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derivatives according to all components of 0. For sufficient small ¢ > 0 and for all 
A € Q with | - O| < e, let the supremum of 


2 2 


70,00, InL(y,4o) - 70,00, InL(y,0) 


be bounded with respect to y by a function, which is integrable relative to the 
components of 0. Let the sequence {6,} of maximum likelihood estimators be 
consistent. Finally assume that the information matrix J(@) exists and is positive 
definite. Then {6} is a BAN estimator with respect to 0. 


Generally BAN estimators are not unique. For example, the estimators s’, 
6” and6” given in Section 2.4.1 are BAN estimators. 


2.5 Exercises 


2.1 Let y be a random variable whose values -1, 0, 1, 2, 3 occur with the 
probabilities 


P(y = -1) =2p(1-p),P(y=k) =p*(1-p)>*,0 <p<1,k=0,1,2,3. 


a) Show that this defines a probability distribution for y. 

b) Give the general form of all functions U(y), which are unbiased with 
respect to 0. 

c) Determine locally variance-optimal unbiased estimators for p and for 
p(1 - p) on the basis of Theorem 2.3. 

d) Are the LVUE obtained in c) also UVUE? Check the necessary and suf- 
ficient condition (2.11). 


2.2 Let Y=(yy, yx --) Yn)’ 12 1 bearandom sample from a binomial distrib- 
uted population with parameters n and p, 0 < p < 1, n fixed. Determine the 
uniformly variance-optimal unbiased estimator for p and for p(1 — p). 


2.3 Let Y=(y1, yo, «.-,¥,)’,n=1 be a random sample, where the second 
moments for the components exist and are equal, that is, var(y) = 0” < co. 


a) Show that 
1d 9 
eae = 
(Y) nai de y) 


is unbiased with respect to 0”. 
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2.4 


2.5 


2.6 


2.7 


b) Suppose that the random variables y; take the values 0 and 1 with 
the probabilities P(y;=0) = 1 - p and Piy;=1)= p, 0 < p < 1, 
respectively. 

Prove that in this case S(Y) is a uniformly variance-optimal unbiased 

estimator with respect to p(1 — p). 


Let Y= (y1, 92; «5 Yn) Nn >1 be a random sample whose components 
have the distribution Pp. 

Calculate the maximum likelihood estimator with respect to 0 as well as 
the estimator according to the method of moments using the first usual 
moments of Pg, where Pg is the uniform distribution in the interval 
(a) (0, 8), (b) (0, 20) and (c) (0, 0+1). 


Let Y= (y1, ¥2) ..., In)’ ,n = 1 bea random sample from a population uni- 
formly distributed in the interval (0, 6), 9 € R*. Let Syiz(Y) and S,;(Y) be 
the estimators described in Exercise 2.4 (a). 


a) Are the estimators Sjj;(Y) and S,,(Y) unbiased with respect to 6? 
If not, then change them in such a way that unbiased estimators 
Sui(Y) and Sy(Y) are created. 

b) Determine the UVUE with respect to 6 and the relative efficiency of 
Sux(Y) and Siy(Y¥). 


Consider three stochastically independent random samples 
KS (i he, 05 Mn) = VV a Ie) ANZ = Cy Bop 005 Zp) 


Let the random variables x;, y;, z; be distributed as N (a,o7), N(b,o3) and 
N(c,o2), respectively. Further, we suppose that 02, oj, 02 are known and 
that c= a + b holds. 


a) Determine ML estimators for a, b and c, where only the sample from the 
population is used for estimation whose expectation is to be estimated. 

b) Calculate estimators for a, b and c applying the maximum likelihood 
method, if the united sample and c = a + b are used for estimation. 

c) Determine the expectations and variances of the ML estimators from 
(a) and (b). 


The task is to estimate the parameter 0 in the model 
¥, =fi(xj,0) + e;,i=1,...,. 


Further, assume that the random variables e; are distributed as N(0, 0”) and 
are stochastically independent. Show that the maximum likelihood method 
and the least squares method are equivalent under these conditions. 


2.8 


2.9 


2.10 


2.12 


2.13 
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a) Let Y= (y1, yo, «.. In)» n>=1bearandom sample with E(y,) = 0 < oo 
(i=1, ... ,m). Determine the LSM estimator of the expectation 0. 
b) Estimate the parameters a and / of the linear model 


y,=a+ Px; t+e,i=1,...,n 


according to the least squares method, where x; # x; holds for at least one 
pair (i, j) of the indices. 


Let Y=(y1, ya) «-,¥%_)’,n 21 be a random sample whose components 
are uniformly distributed in (0, 0), and let S(Y) = (,) be the maximum 
likelihood estimator with respect to @. Calculate the bias of this 
estimator. 


Let ¥1, 2; --- » Yn be independently and identically distributed and pos- 
itive random variables with E(y;)= > 0, var(y;) = 0? < coandx, ,%2, ... , 
x, independently and identically distributed random variables with 
E(x; =n > 0, var(x;) = 77 < oo. Further, let 


pot fori=j 


cov(niy) =f ,Lj=1,...n,|p| <1. 
0 fori¥j 


First estimate g(0) = ” Then show that the estimator # /¥ and its jack- 


knife estimator with respect to g(@) have biases of order O(1/n) and O(1/ 
n’), respectively. 


Let Y= (y1, ya, «.-5 Yn) 5 n=1 be a random sample whose components 
are uniformly distributed in the interval [w - a; yw + a]. 


a) Determine the expectation of the ith order statistic (i = 1, ..., 1). 
b) Show that the median of the sample (see Definition 2.17) is in this 
case an unbiased estimator with respect to yp. 


Let the random sample Y= (y1,y2, ...,Jn)’ of size n > 2 be from a 
population exponentially distributed with the parameter a > 0. 


a) Give the efficiency function for the estimators that are unbiased with 
respect to a. 

b) Starting with the ML estimator for a, determine an unbiased estima- 
tor and calculate its relative efficiency. 


Show that the ML estimator of Exercise 2.12 (b) is asymptotically unbi- 
ased and consistent. 
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2.14 Let y1,¥2, ... »¥, be independently and identically N(@, 20)-distributed 
random variables. Determine the ML estimator of the parameter 0 > 0 
and check its consistency. 
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Statistical Tests and Confidence Estimations 


3.1 Basic Ideas of Test Theory 


Sometimes the aim of investigation is neither to determine certain statistics 
(to estimate parameters) nor to select something, but to test or to examine care- 
fully considered hypotheses (assumptions, suppositions) and often also wishful 
notions on the basis of practical material. Also in this case a mathematical 
model is established where the hypothesis can be formulated in the form of 
model parameters. We want to start with an example. 

Table potatoes are examined beside other things, whether they are infected by 
so-called brown foulness. Since the potato under examination is cut for that rea- 
son, it is impossible to examine the whole production. Hence, the examiner 
takes at random a certain number x of potatoes from the produced amount 
of potatoes and decides to award the rating ‘table potatoes’ if the number r 
of low-quality potatoes is less or equal to a certain number c, and otherwise 
he declines to do so. (For example, we can suppose that a quantity of potatoes 
is classified as table potatoes, if the portion p of damaged or bad potatoes is 
smaller than or equal to 3%.) This is a typical statistical problem, because it 
concludes from a random sample (the 1 examined potatoes) to a population 
(the whole amount of potatoes of a certain producer in a certain year). 

The above described situation is a bit more complicated than that for estima- 
tion and selection problems, because evidently two wrong decisions can appear 
with different effect. We call the probability to make an error of the first kind or 
type I error (e.g. by classifying table potatoes wrongly as fodder potatoes) the 
risk of the first kind @ and correspondingly the probability to make an error 
of the second kind or type II error (e.g. by classifying fodder potatoes wrongly 
as table potatoes) the risk of the second kind /. 

Both errors have different consequences. Assuming that table potatoes are 
more expensive than fodder potatoes, the error of the first kind implies that 
the producer is not rewarded for his effort to supply good quality; therefore 
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the risk of the first kind is called also producer’s risk. However, the error of the 
second kind implies that the consumers get bad quality for their money; there- 
fore the risk of the second kind is called also consumer’s risk. The choices of 
numbers a and f depend on v and c, and reversely, 1 and c have to be chosen 
suitably for given a and /. 

Generally a statistical test is a procedure to allow a decision for accepting or 
rejecting a hypothesis about the unknown parameter occurring in the distribu- 
tion of a random variable. We shall suppose in the following that two hypotheses 
are possible. The first (or main) hypothesis is called null hypothesis Ho, and the 
other one alternative hypothesis H,. The hypothesis H = Hp is right, if H, is 
wrong, and vice versa. Hypotheses can be composite or simple. A simple 
hypothesis prescribes the parameter value 9 uniquely, for example, the hypoth- 
esis Hy: 0 = Op is simple. A composite hypothesis admits that the parameter 0 
can have several values. 

Examples for composite null hypotheses are: 


Ho:0=05 or 0=0; 
Ho:6<41, 
Ho:0# 6. 


Let Ybe a random sample of size n, and let the distribution of their components 
belong to a family P = {P9, 0 € Q} of distributions. We suppose the null hypoth- 
esis Hp :0 € @ = Qo CQ and the alternative hypothesis Hy :0 € Q,=2\aCcQ. 
We denote the acceptance of Hy by dp and the rejection of Hp by dy. A non- 
randomised statistical test has the property that it is fixed for each possible rea- 
lisation Y of the random sample Y in the sample space {Y} whether the decision 
has to be dy or dy. According to this test, the sample space {Y} is decomposed 
into two disjoint subsets {Yo} and {Y4} ({Yo} N {Ya} = ©, {Yo} U {Ya} = {¥}), defin- 
ing the decision function 


y) dy for Y e{Yo} 
~ | da for Ye{Y4} 


The set {Yo} is called acceptance region, and the set {Y,} critical region or 
rejection region. We consider a simple case for illustration. Let 9 be a one- 
dimensional parameter in Q = (— 00,00). We suppose that the random variable 
y has the distribution Py. With respect to the parameter, two simple hypotheses 
are established, the null hypothesis Hg: @ =p and the alternative hypothesis 
H,:0=0, where 0 <6,. Based on the realisation of a random sample Y= 


(v1, «-1In)', we have to decide between both hypotheses. We calculate an 


estimator @ from the sample whose distribution function G(6, 0) is known. 
Let 8 be a continuous variable with the density function g (8, 0), which evidently 


depends on the true value of the parameter. Consequently 6 has under the null 
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z(0.5) z(1-a) 


2.5 -2 1.5 -1 05 O 0.5 1 15 32 25 3 3.5 4 
Ho= 0 Ly = 2 


Figure 3.1 Density functions of the estimator of the location parameter depending on the 
hypothesis values w = 0 and yw = 2, respectively. 


hypothesis the density function g(4, Oo) and under the alternative hypothesis 
the density function (8, 0,). Further, we assume for the sake of simplicity that 
y is normally distributed with unknown expectation @ = « and known variance 
o°. Then the densities g(0,00) and g(6,01) are of the same type. Their graphs 
have the same shape and are only mutually shifted along the 6-axis, as Figure 3.1 
shows. Here we put 0 =¥. 

Both hypotheses are simple hypotheses. In this case usually the test statistic 


Z= Yho fy 
oO 


is applied where ¥ is the mean taken from the random sample Y of size n. 
Starting with the (random) sample mean jy, first the value fg of the null 
hypothesis is subtracted, and then this difference is divided by the standard 


o 
deviation —— of y. 


Va 


Therefore z= Y7Ho 


Vn has the variance 1 and under the null hypothesis 

the expectation 0. Under the alternative hypothesis, the expectation is 

E(z)= 92 Va. The corresponding number jo Dy is called non- 
o o 


centrality parameter. 

Having in mind a test decision to define on the base of the realisation z of z, we 
determine for the chosen a with 0 < a < 1 the (1 - a)-quantile (z(1 - a) = z, _ ,) of 
the standard normal distribution that can be found in Table D.2 and for special 
values of a in the last line of Table D.3 (see Appendix). Then the decision is as 
follows: reject Ho, if z > z(1 - a), the so-called critical value generally denoted by 
0, and otherwise accept the null hypothesis (i.e. for z < z(1 - a)). 
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This decision rule is illustrated in Figure 3.1. In the coordinate system, for 
example, the values #9 = 0 and y; = 2 are marked on the z-axis (where z repre- 
sents certain realisations of z). Taking each of the two values as corresponding 
expectation, the curves of the density functions can be plotted (shifted standard 
normal distributions). Regarding the left curve belonging to fo = 0 the critical 
value, the quantile z(1 — a) is marked on the z-axis; besides a vertical straight 
line through z(1- a) separates certain area parts under both curves (shaded 
in Figure 3.1). Besides a@ = 0.025 is chosen in Figure 3.1 supplying z(1 - a) = 
z(0.975) = 1.96. 

The decision of rejecting the null hypothesis, if z > 1.96 (where z is obtained 
from the realisations of the random sample or in other words is calculated 
from the measurements), can be wrong, because such a value z is also possible 
for w > 0 (e.g. for w = 2). 

Let us come back to the general case. The probability to get an estimate of 
0 > 6, for valid null hypothesis is equal to 


| g(6, 00)d0=a. 
Ox 


The value of a is represented by the darker shaded area under the curve of 
g(6, 0) in Figure 3.1. If the null hypothesis is rejected although it is right, then 
an error is made, called error of the first kind. The maximal probability of 
wrongly rejecting the null hypothesis in a test is said to be the risk a of the first 
kind or significance level. However, the alternative hypothesis is said to have a 
significance of (1 — a) - 100%. 

The better a test seems, the smaller its risk of the first kind. Considering prac- 
tical investigations a risk of the first kind a = 0.05 seems to be only just accept- 
able in the most cases. Users may ask why the test is not designed in such a way 
that a has a very small value, say, a = 0.00001. Figure 3.1 clearly illustrates that 
the further to the right the bound 6; (in this case z(1 — a)) between both regions 
is shifted, the smaller a (i.e. the area under the curve of (6, 0)) is to be chosen 
on the right-hand side of z(1 — a). But then the probability to make another error 
increases. Namely, if we calculate an estimate 6 < 6; from the realisation of the 
sample, then the null hypothesis is accepted, although this value would be also 
possible in the case that the alternative hypothesis is right and consequently the 
null hypothesis is wrong. 

If we accept the null hypothesis although it is wrong, then another error is 
made, called the error of the second kind. The probability 6 of wrongly accept- 
ing the null hypothesis, that is, the probability to make an error of the second 
kind, is said to be the risk of the second kind. In Figure 3.1 this risk is repre- 
sented by the lightly shaded area under the curve of (6, 0) at the left-hand 
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side of 0,. Its value is obtained by integrating the density function g(6, 0,) from 
—oo up to 0, that is, 


Ox 


| (0, 0:)d0 =p. 


-— 0 


Figure 3.1 makes clear that a can only be reduced for a certain test and a fixed 
sample size if a larger / is accepted. Hence, the risks of the first and the second 
kind cannot simultaneously be made arbitrarily small for a fixed sample size. 
Applying statistical tests, it is wrong but common to focus mainly on the risk 
of the first kind while the risk of the second kind is neglected. There are a 
lot of examples where the wrong acceptance of the null hypothesis can produce 
serious consequences (consider ‘genetic corn has no damaging side effects’ or 
‘nuclear power stations are absolutely safe’). Therefore, it is advisable to control 
both risks, which is always possible by suitably chosen sample size. In the fol- 
lowing scheme the decisions performing a statistical test with respect to the true 
facts (Ho null hypothesis, H, alternative hypothesis) are shown. 


Result of the 


True fact Decision decision Probability of the result 
A right Hy accepted Right decision Acceptance (or confidence) 
(H, wrong) (Hy, rejected) probability 1 - a 
Hp rejected Error of the first Significance (or error) level, 
(Ha accepted) kind risk a of the first kind 
Ho wrong Hy accepted Error of the Risk £ of the second kind 
(Hy, right) (H, rejected) second kind 
Ho rejected Right decision Power 1 - f 


(H, accepted) 


To be on the safe side, it is recommended to declare that hypothesis as null 
hypothesis, which causes the more serious consequences in the case of wrong 
rejection. 

A generalisation of the situation just described is given if, after knowing the 
experimental results, that is, the realisation @ of the statistic 0, it is not instantly 
decided which of the two hypotheses is accepted. But instead a random proce- 
dure (a kind of tossing a coin) is used accepting the null hypothesis with prob- 
ability 1-k(Y) =1 -k(0) and the alternative hypothesis with probability k (0), if 
Ye {Y} was observed (or 6 calculated). Although the user of statistical methods 
will hardly agree, leaving it after carefully planned and often cost intensive 
experiments to leave it to chance which of the two hypotheses should be 
accepted, the theory of testing is firstly based on the concept of such randomised 
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tests. The significance of the Neyman—Pearson lemma in Section 3.2 is just to 
recognise that non-randomised tests are sufficient for continuous distributions. 


Definition 3.1 Let Ybe a random sample with Ye {Y} whose components are 
distributed as Pg € P = {Po, 0 € Q}. Further, let k(Y) be a measurable mapping of 
the sample space {Y} onto the interval (0, 1). It is called critical function. If k(Y) 
states the probability for rejecting Hy: €@ (ie. for accepting Hy:0¢€Q\q@) in 
the case that Y takes the value Y e {Y}, then the critical function defines a 
statistical test for the pair (Hp, H,) of hypotheses. Then k(Y) is shortly called 
a test. The test k(Y) is said to be randomised if it does not take with probability 
1 only the values 0 or 1. 

Now we want to define the risks of the first and the second kind for such 
general tests k(Y). In this chapter we consider only such functions k(Y) 
whose expectation exists for all 9 ¢ Q. The notation E[k(Y)|@] means that the 
expectation is taken with respect to the distribution Pg ¢€ Q. 


Definition 3.2 If k(Y) is a statistical test for a pair (Ho, H,) of hypotheses 
according to Definition 3.1, then 
E[k(Y)|@€ w= | k(Y)dPo = a(0),P9 ¢ P,Oew (3.1) 
{Y¥} 
is said to be the risk function of the first kind and 
1-E[k(Y)|0 € Q\ a] =f() (3.2) 


the risk function of the second kind. The function 


n(0) = | k(Y)dPo,Poe PO €Q 
{¥} 
is said to be the power function of the test. Further 


maxa(9) =a 
is called the significance level of the test k(Y). A test with the significance level a 
is briefly called an a-test (alpha-test). 
If a(9) =a for all @€q@, then the test k(Y) is said to be a-similar or simply 
similar. 


If @ and Q\q, respectively, are the closures of w and 2\@, respectively, 
and if @NQ\ a =Q is the common boundary of both subsets, then k(Y) 
is called a-similar on the boundary, if E[k(Y)|@¢ Q*|=a is fulfilled with 
Po - probability 1. 
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Definition 3.3. Considering tests w = {09} and Q \ w = {6,4} with simple null 
and alternative hypotheses, then k*(Y) is said to be most powerful a-test, if 
for all a in the interval (0, 1) 
E|k(Y) \@a] = E|k*(Y) |@4] =a, 
ae [K(Y) |@4] =£[k*(Y) |@4] =a 
where K, is the class of all a-tests. 
According to Definition 3.2, an a-test k(Y) e K, for the pair of hypotheses 


Ho :0=0@, H4:0=04 
considered in Definition 3.3 is a test with 


E|k(Y)|@o] =. (3.3) 


Definition 3.4 An a-test k*(Y) for the pair Hp: 0¢€ w against Hy:0€ Q\q@ is 
said to be uniformly best a-test, if 


E[k*(Y)|0 €Q\ a] 2E[k(Y)|9¢ Q\ a] (3.4) 


for each other test k(Y) with a significance level not larger than a and for all 
ae (0, 1). The test k*(Y) is also briefly called a uniformly most powerful test 
(UMP-test). 


Definition 3.5 If k(Y) is with respect to the pair Hyp:0€@,H,:0€Q\ man 
a-test and if for its power function 2(0) 2 a holds for all 0 ¢ Q\q@ and for all 
ae (0, 1), then k(Y) is said to be an unbiased a-test. If K,,. is the class of all unbi- 
ased a-tests and if 

ae E[k(Y)|@4] =£[K"(Y)| 64] forall 0,€2\o, 

k(Y) € Kua 
then k**(Y) is said to be a uniformly most powerful unbiased a-test 
(UMPU-test). 

We need the following statement in the next sections. 


Lemma 3.1 Let (Ho, H,4) be a pair of hypotheses Hp :0€ w,Hy:0€2Q2\qa 
concerning the parameter 6 of the distribution family P = {P9, 0 € Q}, assuming 
that each test has in 0 a continuous power function z(). If k(Y) is with respect to 
(Ho, Ha) in the class {Kg} of all on the boundary a-similar tests the uniformly 
most powerful a-test, then it is also a uniformly most powerful unbiased a-test. 


The proof uses the facts that the class Kg: contains the class of unbiased 
a-tests, taking the continuity of z(9) into account, and considering that k(Y) ful- 
fils the inequality (3.4) for all k*(Y) € Ka, it all the more fulfils this inequality 
for all k*(Y) € Kya. Moreover, k(Y) is in K,, since it fulfils as uniformly most 
powerful a-test in Kg: also inequality (3.4) for k*(Y) ¢ Kg. Hence, its power 
function in Q\q@ cannot lie under that of k*(Y), which is just a. 
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Example 3.1 Let Y= (y1, 72, ..., Jn)’ be a random sample of size n > 1 froma 
N(u, o7)-distribution, where y¢ R' =, 6% known. Let w=(-oo,a] and 
therefore Q\@ = (a,co). Then z= Y~* Mis distributed as N [vay 
oO (oy 


We consider the test k(Y) with 


(Y) = 0 ORFS 7595. 
1 for Z > Z0.95 


Here we have Zo95 = 1.6449 and (zp 95) = 0.95. This is a 0.05-test because 
P{z>Z0.95|u<a}<0.05. 


Regarding P{z > Zo,.95| # > a} > 0.05, this is an unbiased 0.05-test. For each other a 
in the interval (0, 1) 


0 forz<Z,_¢ 
k(Y)= 


1 forz>Zi_¢ 


is an unbiased a-test which can easily be seen. 
Let p=a+6 (620). Then the power function for a = 0.05 is 


1(65) -P{2 > 1.6449- i. 


Table 3.1 lists 2(6) for special 6 and n. 

In the applications 6 is chosen as practically interesting minimum difference 
to the value of the null hypothesis (also called effect size). If we want to avoid 
such a difference with at most probability /, that is, to discover it with proba- 
bility 1 — 6, we have to prescribe a corresponding sample size. Again we consider 
the general case that Y is a random sample of size 1 taken from an N(u,07)- 
distribution. 

Putting wy = a+ 6 (6>0), assuming a = 0.05 and after that fixing # = 0.1, then 
the difference 


6 
16449-¥76 
oO 


has to be the 0.1-quantile of the standard normal distribution, namely, -1.2816. 
Therefore 
6 
1.6149- _ 1 9816 
oO 


is satisfied. This equation has to be solved for n. 


Statistical Tests and Confidence Estimations | 87 


Table 3.1 Values of the power function in Example 3.1 for n = 9, 16, 25, 
o = 1 and special 6. 


6 a(6),n =9 a(6), n= 16 a(6), n = 25 
0 0.05 0.05 0.05 
0.1 0.0893 0.1066 0.1261 
0.2 0.1480 0.1991 0.2595 
0.3 0.2282 0.3282 0.4424 
0.4 0.3282 0.4821 0.6387 
0.5 0.4424 0.6387 0.8038 
0.6 0.5616 0.7749 0.9123 
0.7 0.6755 0.8760 0.9682 
0.8 0.7749 0.9400 0.9907 
0.9 0.8543 0.9747 0.9978 
1.0 0.9123 0.9907 0.9996 
1.1 0.9510 0.9971 0.9999 
1.2 0.9747 0.9992 1.0000 


Choosing 6 = 6, we find 
1.6449 - \/n = - 1.2816 


and 1 = 8.56. But we have to look for the smallest integer n, which is larger or 
equal to the calculated value (using rounding up function CEIL(x) denoted 
here by [x]). Hence, we get 1 = [(1.6449 + 1.2816)”] = [8.56] and that is 9. 
Generally, we get the sample size for given a, f,o and 6 by the formula 


2 
n= | (a0 ra) ; 


6 
We call — the relative effect size. 
oO 


3.2. The Neyman-Pearson Lemma 


The authors Neyman and Pearson as a lemma introduced the following very 
important theorem. 
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Theorem 3.1 Neyman—Pearson Lemma (Neyman and Pearson, 1933) 

Let L(Y, 0) be the likelihood function of the random sample Y= (yj, ..., Yn) 
with Y € {Y} and 0€Q= {4,04} with 0) 4 6,. Further, the null hypothesis 
Ho: 0= 9 is to be tested against the alternative hypothesis H,:0@=0,. Then 
with a constant c > 0, the following holds: 


1) Each test k(Y) of the form 
1 for L(Y, 04) >cL(Y, 6) 
k(Y)=<¢ y(Y) forL(Y,64) =cL(Y, 60) (3.5) 
0 for L(Y,04) <cL(Y, 90) 


with 0 < y(Y) < lis fora certain a = a[c, y(Y)] a most powerful a-test (0 < a < 1). 
The test k(Y) with 


1 forL(Y,0o) =0 


k(Y)= : 
ot . for L(Y,00) >0 36) 


is a best O-test, and the test k(Y) with 


1 forL(Y,04)>0 
k(Y) = (3.7) 


0 forL(Y,04) =0 
is a best 1-test. 


2) For testing Hp against Hy, there exist for each ae (0, 1) constants ¢ = Ca 7 = Ya 
so that the corresponding test k(Y) in the form (3.5) is a best a-test. 

3) If k(Y) is a best a-test with ae (0, 1), then it is with probability 1 of the form 
(3.5) (apart from the set {Y: L(Y, 04) = c L(Y, @o)} of Pe-measure 0) if there is 
no ao-test k*(Y) with ap < a and E[k*(Y)| 04] = 1. 


Proof: 
Assertion (1) 
If a = 0, then k(Y) satisfies the relation (3.6). If k’(Y) is another a-test, then 


Elk (Y)| 60] = [ean dPa, =0 


for By = {Y: L(Y, 0p) > 0}. If L(Y, 0) > 0 holds, then k’(Y) has to be equal to 0 with 
probability 1. Putting By = {Y} \ Bo, we get 
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E|k(Y)|@4] -E[k'(¥)]|6a] = | [k(Y)-K'(Y)] dPo, 
{Y} 


7 | [k(Y)-K'(Y)] dP, 


= | Kye] AP, = | {1-K or) ao, >0 


and therefore the assertion (1) for a = 0 from (3.6). Analogously the assertion 
(1) follows for a = 1 from (3.7). Hence, we can now consider a-tests with 
0 <a< 1. We show that they are most powerful a-tests if they fulfil (3.5). 

Let k(Y) be an a-test of the form (3.5), that is, besides (3.5) assume also 


E|k(Y)| 0] =a. (3.8) 


If k’(Y) is an arbitrary test with a significance level not larger than a, then we 
have to show 


E[k(Y)| 4] = E[k/(Y)] Oa]. (3.9) 


For L(Y, 0,4) >c L(Y, Oo) it is 1=k(Y)=k’(Y), and for L(Y, 04) <c L(Y, Op), it is 
0=k(Y) <k’(Y). That means 


IL (Y,04)-eL (Y,90) |[k(Y)-K’(Y)]20, 
and therefore 
Ik(Y)-K (Y)]|dPo, ~caPo,] 20, 
and further 
E[k(¥)| 4] - Elk (¥)]| 94] 2 cfElK(Y)| 0] -Elk’(¥)]| Oo} 20. 


Finally, this implies (3.9). 

Assertion (2) 

For a = 0 and a = 1 the formulae (3.6) and (3.7), respectively, have the 
form (3.5) putting co = oo (where 0-00 =0), 79 = 0 and c, =0, y; = 0, respec- 
tively. Therefore we can restrict ourselves to 0 <a <1. 

If we put 7(Y) =, in (3.5), then the constants cg and yo are to be determined 
so that 


a= E|k(Y)|Oo] =1-PIL (Y, 04) > caL (Y,90)] + 7qPIL (YO) = ca L(Y, 90)], 
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and with the notation 


7 L(Y,0,) 
4° T(¥, 6)’ 
then 

a=1-Plqscq| 00] +7eP|q = Ca| 0] 


holds. In the continuous case we choose the (1 — a)-quantile of the distribution 
of q for c, and y, = 0. If q is discrete, then a constant c,, exists in such a way that 


Pig <Cq| 90] <1-asPlq<cq| 9p] (3.10) 
holds. We put 


pe Plq <Ca|9o|-(1-@) 
. Pig =Ca| | 


(3.11) 


if the equality does not hold twice (= for both <) in (3.10) (i.e. if Plg = ca| Ao] > 0). 
Otherwise (for vanishing denominator), we proceed just as in the continuous 
case and write k(Y) in the form (3.5). 

Assertion (3) 

Since 0 < a< 1 is supposed, we take k(Y) as a most powerful a-test of the form 
(3.5) with c = c, and y(Y) = 7q from (3.10) and (3.11), respectively, and choose 
the a-quantile c, of g and y, = 0. Let k’(Y) be an arbitrary most powerful a-test. 
Then both 


E{k(Y)| 0] =E|k’(Y)| Oo] =a 
and 
Ek(Y)| @4] = Elk’ (Y)| 6a] 
must hold, which means 
| Ik(Y) -K(Y)] dPo, =0, 0¢ {00,04} 
{Y} 
as well as 
| [k(Y)-k'(Y)]|dPo, - Ca dPo,| =0. 
{Y} 


This implies the assertion. If there is an ag-test k*(Y) with ag < a and 
E|k*(Y)| 4] =1, then this conclusion is not possible. 


Statistical Tests and Confidence Estimations | 91 
The Neyman—Pearson lemma has some interesting consequences. 


Corollary 3.1 Let the general assumptions of Theorem 3.1 be fulfilled and 
put £ = E[K(Y) | 6,4]. Then always a < f for the most powerful a-test k(Y) if 
L(Y, 00) # L(Y, 64). 


Proof: Since E[k*(Y)|@4] =a holds for the special a-test k*(Y) =a, it follows 
a<f for the most powerful a-test k(Y). However, a=f is not possible. 
Otherwise k*(Y) =a would be a most powerful a-test and would have with 
probability 1 the form (3.5) considering (3) in Theorem 3.1. Nevertheless, both 
would only hold, if L (Y, 99) is with probability 1 equal to L(Y, 04). This contra- 
dicts the assumption of the corollary. 

One of the best books of test theory was that of Lehmann (1959), and we 
also cite the revised edition of Lehmann and Romano (2008). 

Theorem 3.1 can be generalised (for the proof, see Lehmann, 1959, pp. 84-87). 


Corollary 3.2. Let K be the set of all critical functions k(Y) of a random sample 
Y with respect to a distribution Py e P= {P9, @¢ Q}. Further, let 1, ... Sn 
and gy be in R” defined real Py-integrable functions. Additionally, for given real 
constants Cc), ... , Cj let k(Y) exist so that 


k(Y)gi(Y) dPo=cy, i=1,...,.m. 
{Y} 
We denote the class of functions k(Y) € K satisfying this equation by K,. 


1) There is a function k*(Y) in K, with the property 


[ ea) dPo= max | KOO) AP. 


{Y} {Y} 


2) Let real constants kj, ..., k,, and a function y(Y) with 0 < y(Y) < 1 exist so that 


1 for go> 27" kigi 
K(X) = 7(Y) for go = oye ki i 
0 for go < 27. kigi 


for all Ye {Y}. Then (1) is satisfied using this k*(Y). 
3) If k*(Y) ¢ K, fulfils the sufficient condition in (2) with non-negative k;, then 


| k*(Y)go(Y) dPo = iene | k(Y )go(Y) dPo 


{Y} {Y} 
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follows, where K; C K, is the set of critical functions k(Y) with 


k(Y)gi(Y) dPo<ci, i=1,...,.m. 
{Y} 


4) The set M Cc R” of the points 


| k(Y)gi(Y) dPoyees | k(Y)gm(Y) dPp 
{Y} {Y} 


generated by the functions g; for any k(Y) € K, is convex and closed. If 
C=(Cq, v+5Cm)? is an inner point of M, then there exist m constants 
ky, ..) km and any k*(Y) € K, so that the condition in (2) is true. 


The condition that k*(Y) with probability 1 has the form in (2) is necessary for 
a k*(Y) « K, to fulfil the equation in (3). 
If we put m = 1 in Corollary 3.2, we get the statements of Theorem 3.1. 


Example 3.2 Let Y be a random sample of size 1 taken from a N(u, 07)- 
distribution, where o” is known. Besides we assume that py € {a, b}, that is, p, 
can be either equal to a or equal to b 4a. We want to test Ho: w = a against 
H,: «= b. Since the components of Yare continuously distributed, a most pow- 
erful a-test for this pair of hypotheses has according to Theorem 3.1 the form 
(3.5) with yg, = 0 (0 < a < 1). Besides, we have 


1 et ‘ 
L(Y, 09) =L(Y,a) = ovata Soy? 242). 9+ na?) 


(2x07)? 
and 
L(Y,04) =L(¥,b) => eB Diet PL a0) 
(2x07)? 
as well as 


L(Y,b) _ [np(b-a)—4a+b)(b-a)]_ 


The quantity c = cy in (3.5) has to be chosen so that 1-a=P(q<cq) = 
P(Ing < Incq). 


Statistical Tests and Confidence Estimations 


Considering 


nN 


fo 
Ing = a [ny(b-a) 5 (4+ b)(b-a) : 
the relation In q < In cg is equivalent to 


o Incg (a+b) 


fe 
: Shea 5 or a<b 
y 
o Inca (a+b) 
> + for a>b 
n(b-a) 
Since 
ge? a. (3.12) 
oO 


is N(O, 1)-distributed, it holds with the (1 — a)-quantile z,_, of the standard 
normal distribution under the null hypothesis Ho 


P(X vica-«] =l-a 
oO 


oO 
Pl ¥<—~21__,+4)=1-a, 
(y Va’ ) ss 


respectively. Regarding the case a < J, it follows under Ho 


Ped ye 0° Ine rf (a+b) 

vn?" n(b-a) 2 
and 

cg x eft Vila) slab 


respectively. Analogously we get for a > b 


L ~a)--*,la-by* 
Cy = ere Vn b a) ~x2y(a-b) ; 


This leads to an important statement. 


Theorem 3.2 Let the random sample Y = (y1, ya, «-. Jn)” be for known o° > 0 
distributed as N(w 1,,, 07I,,). Assume that y can only have the values a and b (with 
a# b).\f Ho: v= ais tested against H, : vy = b, then a most powerful a-test k(Y) is 
given in form of (3.5) with y, = 0 and 


of =g\=2ta eg by 
Cy = e820 VE (ba) saab)” 
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which can be written with z in (3.12) also in the form 


1 for |z|>Z1-a 
KY)= ; 


0 else 


that is, Ho is rejected for |z| > Z1_q. 


This test is one-sided, since it is known whether b > a or b < a holds. The test 
corresponds to the heuristic-derived test in Example 3.1. Hence, the sample size 
given there is always the smallest possible. 

Now we turn to discrete random variables. 


Example 3.3. Let the random variables y; with the values 0 and 1 be 
independent from each other as Bil, p) ik esi distributed, where p= 
P(y;=1) and 1-p=P(y;=0) with p € (yo, pa};i=l1, ... ,n. We want to test 
the null hypothesis Ho: p = po against Hy: Pe Pa. i y= +719; is BY, p) 
binomial distributed, and for Y= (y1, ...5 Yn)? 


L(Y,p) = ("eran 


According to Theorem 3.1, there exists a most powerful a-test of the form (3.5). 
We determine now 7, and c,. Regarding 


L(Y, %/1- ae 

_L pa) _ (4) ( za) (3.13) 
L(Y,po) \po/ \1-po 

the following equation is satisfied: 


Ing= y[Inpa - In(1-pa)- Inpot In(1 - po)| 


n(In(1-pa)- In(1-po)]. 
Case A: 
For the chosen a there exists a y* so that the distribution function of B(1,po) has 
at y* the value F(y*, po) = 1 - a. In (3.5) we put vq = 0 and calculate c,, obtaining 


Ye n-y" 
Ge (5) ( es) (3.14) 
Po 1-po 


Case B: 

For the chosen a there does not exist such a value y* considered in case A. But, 
assuming p, > Po, there is a value y* so that Fy", po) < 1 -a< F(y* + 1, po). Then 
we choose according to (3.11) 


F(y* +1, po) -(1-«) 


Ya= ’ 
nN y* 1- n-y* 
(3: Je (1-p) 


(3.15) 
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and calculate c, again by (3.14). 
If pa < Po, then a value y* exists with F(y*, po) < a< Fly* +1, po). Then we 
choose 


a-F(y*; po) 


Ya = ’ 
n " n-y* 
( ‘)e (1-p) 
y 


where c, is calculated again according to (3.14). Therefore, we can formulate the 
test also directly with y. 
For pg > Po it is 


(3.16) 


1 fory>y* 
k(y)=¢ Yq fory=y* (with y, from (3.15)). 
0 fory<* 
For pg < Po it is 


1 fory<y* 
k(y)=¢ vq fory =y* (withy, from (3.16)). 
0 fory>y 
Now we turn to special data. If 1 = 10 and Ho: p= 0.5 is to be tested against 


H,:p=0.1, then the value y* = 3 follows by (3.16) because of 0.1 < 0.5, and 
for a = 0.1 we get 


_0.1-0.05469 9 ae 
MOL ois 


Then k(Y) has the form 


1 fory<3 
k(y) = § 0.3866 fory=3, 
0 fory>3 


that is, for y < 3, the hypothesis Ho: p = 0.5 is rejected; Hp is rejected for y = 3 with 
the probability 0.3866; and Hp is accepted for y > 3. The random trial in the case 
y = 3 can be simulated on a computer. Using a generator of random numbers 
supplying uniformly distributed pseudorandom numbers in the interval (0, 1), a 
value v is obtained. For v < 0.3866 the hypothesis Ho is rejected and otherwise 
accepted. This test is a most powerful 0.1-test. 

Now our considerations can be summarised. The proof is analogous to that in 
Example 3.3. 
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Theorem 3.3 _ If y is distributed as B(n, p), then a most powerful a-test for 
Ho: p = Po against Hy: p = pa < Po is given by 


1 fory<y~ 
k-(y)=4 vq fory=y~ (withy; from (3.16). (3.17) 
0 fory>y7 
and for Ho: p = po against H,: p = pa > Po by 
1 fory>y* 
k*(y)=4 7¢ fory=y* (withy from (3.15)) (3.18) 
0 fory<y* 


where y_ is determined by 

F(y" po) Sa<F(y" +1,po) 
and y* by 

F(y*,po)<l-asF(y* +1,po). 


Here F(y, p) is the distribution function of B(x, p). 

If possible, randomised tests are avoided. As mentioned earlier, users do 
not really accept that the decisions after well-planned experiments depend 
on randomness. 


3.3 Tests for Composite Alternative Hypotheses and 
One-Parametric Distribution Families 


Theorem 3.1 allows finding most powerful tests for one-sided null and alterna- 
tive hypotheses. In this section we will clarify the way to transfer this theorem to 
the case of composite hypotheses. 


3.3.1 Distributions with Monotone Likelihood Ratio and Uniformly 
Most Powerful Tests for One-Sided Hypotheses 


It is supposed in the Neyman—Pearson lemma that the null hypothesis as well as 
the alternative hypothesis is simple and the parameter space consists of only two 
points. However, such prerequisites are rather artificial and do not meet prac- 
tical requirements. We intend to decrease these restrictions systematically; 
however, we have to accept that the domain of validity of such extended state- 
ments is reduced. First we consider the case Q < R' and one-sided (one-tailed) 
hypotheses. We demonstrate the new situation in the next example. 
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Example 3.4 Let the components of the random sample Y = (91, yo, «.- ; 9)" be 
distributed as N(u, 0”), where o” > 0 is known. The hypothesis Ho : 4 € (—c0, a] 
is to be tested against Hy : y € (a, co). Looking for an a-test the condition 

max E[k(Y)|u]=a 


-oo<psa 
must hold. Regarding the pair of hypotheses 
Hj: =a; H,:p=b>a, 


a most powerful a-test is defined (see Theorem 3.2). Since k(Y) is a most 
powerful test for each b € (a, 00), kK(Y) is a uniformly most powerful a-test for 
Hj: =a against H4:p € (a, oo) and for Hp against Hy in the class K, of all 
a-tests, respectively. If 


is distributed as N ba (u-a)a| and 
oO 


ER()| a] =P] 9-4) > 21-4) 


then E[(Y)| »] increases monotone in y and has for y = a the value a and for 
us aavalue v <a. Since k(Y) is a uniformly most powerful test in the class K,, it 
is a uniformly most powerful test for the pair Ho: € (-00, a], 6” > 0 against 
H,4:/€ (a, &) , 0” > 0, because the class of tests satisfying E[k(Y)|y] <a for all 
Le (-co0, a] is a subset of Ky. 

The results are summarised in the following statements. 


Theorem 3.4 Under the assumptions of Theorem 3.2, Ho: u < ais to be tested 
against H,: >a. Then 


y-a 
1 for ae i Sei 
oO 


k(Y) = (3.19) 


0 else 


is a uniformly most powerful a-test. Analogously 


y-a 
1 for —— i 
K(Y) = or — VN<Z 


0 else 
is a uniformly most powerful a-test for Ho: 2a against Hy: p< a. 


Now we consider normal distributed random samples with known 
expectation. 
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Example 3.5 Let the components of the random sample Y= (1, yo) -.-)¥n)" 
be distributed as N(w, 0”), where y is known. It is Hy :07<o% to be tested 
against H4: 0° = 0% >o4. Then 


q(n) = +0 n=O i-b) 


is as CS(n) ony. y’-distributed with n degrees of freedom. Considering the 
pair {Hj :0? =09,H4 :0° = 04 > 05}, the test 


k*(Y ae for Q(n) = o3CS(n|1-a) 


0 else 


is according to Theorem 3.1 a most powerful a-test, where CS(n|1-a) is the 
(1-a)-quantile of the CS()-distribution. 

Since this holds for arbitrary 0%, > 0, k*(Y) is a uniformly most powerful a-test 
for the pair {Hj,H,4}. Observe that 


E|k*(Y)|o?] =P{o7g(n) > 05CS(n\1-a)} <a 


holds for all o” < 62. Besides it is 


{a(n > “ics(nit-a)} Cc {atm > “ics(nit-<)} 


for 0 <0} <0} <0}. This implies 
E|k*(Y)|o7] s E[k*(Y)|o3] < E[k*(Y)|o9] 

and 
maxE [k*(Y)|o”] =a, (3.20) 
oO S09 

respectively. Hence, k*(Y) is a uniformly most powerful a-test for the pair 

{Ho, Hy}. 

The results of this example can be stated in a theorem. 


Theorem 3.5 Let the components of the random sample Y = (91, y2, ..-, Yn)" 
be distributed as N(u, 0”), where o” > 0 and p is known. Considering the pairs of 
hypotheses 


a) Hyo 26; Hy 10° =04>64 
MD Ds a eee, 
b) Ho :0° 200; H4:0° =04 <0, 
a uniformly most powerful a-test is given by 


2 a 
a) k*(Y)= . for Q(n) 2 o9CS(n|1-a) (3.21) 
0 else 


Statistical Tests and Confidence Estimations 


and 


ee “{¢ for Q(n) <o3CS(n|a) (3.22) 


0 else 


respectively, where 


a(n) = (n= 5-1-1) 


5. 
- i=l 


The proof of this theorem essentially exploits the fact that the ratio 


is monotone increasing in Q for oj<o% and monotone decreasing in Q 
D132 
for 09 < 0%. 
Such a property is generally significant to get a-tests for one-sided hypotheses 
concerning real parameters. 


Definition 3.6 A distribution family P = {Pg, 0 ¢ QC R'} is said to possess a 
monotone likelihood ratio, if the quotient 


L(y, 92) 
L(y, 01) 
at the positions y, where at least one of the two likelihood functions L(y, 0) 
and L(y,2) is positive, is monotone non-decreasing (isotone) or monotone 
non-increasing (antitone) in y. Observe that LR(y|6),02) is defined as oo 
for L(y, 61) =0. 


= LR(y| 01, 0); 0, < 0 


Theorem 3.6 Let P be a one-parametric exponential family in canonical form 
with respect to the parameter 6 ¢ QC R!. Then P has a monotone likelihood 
ratio, provided that concerning the exponent of the likelihood function, the fac- 
tor T(y) is monotone in y and the factor 7(@) monotone in 0. 


Proof: Wl.o.g. we assume 0) < 02. Then the assertion can evidently be seen 
regarding 


LR(y|O1, 2) =r(y)e7 )i"(%2)-M0)) with r(y) 0. 


Now we are ready to design uniformly most powerful a-tests for one-parametric 
exponential families and for one-sided hypotheses if we still refer to the next 
statements. 
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Theorem 3.7 Karlin (1957) 

Let P= {Pp, 0€ QCR’} bea family with isotone and antitone likelihood ratio 
LR, respectively. If g(y) is Po-integrable and isotone (antitone) in y e{Y}, then 
E[g(y)| @] is isotone (antitone) and antitone (isotone) in 0, respectively. For 
the distribution function F(y, 6) of y in the case of isotone LR and g for all 0 
<@ and y e{Y}, we have 


F(y,0) = F(,0) 
and in the case of antitone LR and g for all 9< 0 and y €{Y} 
F(y,0) <F(y,@). 


Proof: 
Without loss of generality the assertion is shown in the case of isotone. 

First we suppose 0 <@. Further, let M* and M be two sets from the sample 
space {Y} defined by 


M* ={y:L(y,) >L(y,0)}, Mo = {y:L(y,6') <L(y,6)}. 
Since LR(y|61, 02) is isotone in y, we obtain for ye M_, y' e M* the relation y < y’. 
Then the isotone of g(y) implies 


a= max g(y)< max g(y) =D. 
ye ye 
Therefore it is 
D=Elg(y)\@'] -Elg(y) || 
= | 2)(aPy-aPo)= | e)(aPo-dPe)+ | )(dPoaPa) 
{Y} M- M+ 


24 | (apy aPo) +b | (dPy -dPo). 
M- M+ 
(3.23) 


Evidently we have for each @* € Q the relations 
| dP y = | dPy + | dP» =1-P{L(y|0’) =L(y|0)|0 =6'}. 
{Y} M- M* 

It follows for 0* = 


| dey =— | dPy +1-P{L(y16")=L0)8)|9=8)} 


M- M* 
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and correspondingly for 6* = 0 


- | dP9 = | dPg-1+ P{L(y|@’) =L(y|0)| 0 =6'}. 


This means that 


| (dPy —dP9) = - | (dPy -dPo). 
M- M+ 
If this is inserted in (3.23), we finally arrive at 
D2 (b-a) | (dPy — dP») = 0, 
We 
considering b > a and the definition of M*. Observing the assumption @ > @, this 


shows that E[g(y)| 6] is isotone in 0. 
Now we choose g(y) = ,(y),t € R' and 


o) 1 fory>t 
oe 0 else 


Since the function @,(y) is isotone in y, we get 


Ele. (910) < Ele. (916 
using the first part of the proof. Because of 
Elez(y)|0] = Ply >t) =1-F(t,8), 


also the last part of the assertion is shown. 


Theorem 3.8 Let P= {P,, 0¢€QC R'} bea distribution family of the compo- 
nents y,, ... »¥, of a random sample Y and M = M (Y) be a sufficient statistic 
with respect to P. 

Further, let the distribution family P“ of M possess an isotone likelihood 
ratio. Denoting the (1 — a)-quantile M,_, of the distribution belonging to 
M, the function 


1 forM>M,_¢ 
kK(Y) =< vq forM=My_q (3.24) 
0 forM<M,_, 
is a test with the following properties: 


1) k(Y) is an UMP-test for Hp: 0 < Oo against H,:0=0,>0) andO0<a<1. 
(Analogously a test for Hy: 02 0 against Hy: 0 = 0,4 <9 can be formu- 
lated.) 
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2) For all ae (0, 1), there exist Mg andy, with - co < Mj < 0, 0<y,< land MG 
satisfying 


P{M <M¢|)} <1-a<P{M< M6}, 


so that the corresponding test k(Y) in (3.24) is with y, and Mj =M,-q, an 
UMP-a-test for Hp against Hy. 
3) The power function E[k(Y)|6] is isotone in 0 € Q. 


Proof: According to Theorem 3.1, a most powerful a-test for Hp : @ = 0 against 
Hy: 0 = @4 has the form 
1 for coLu(M, 60) < Lu (M, 64) 
k(M)=<¢ y(M) for cola (M, 00) = Lu (M, Oa), 
0 for Colm (M, 60) > Lu (M, 64) 
where L,;(M, @) is the likelihood function of M. Since M > Mg implies 
LRy (M160, 04) 2 L Ru, (M9; 8a) 


because LR, is isotone, 


> 
LRu(M0o, 64) = eM 8a) J, Eee Moa) 
, M(M, 60) “Iu ( Mé, 00) 
< 
implies 
> 
Md = M%. 
< 


Hence k(M) is the same as k*(M) and therefore a most powerful a-test for 
(Hj, Hx), where Mf = M,_, and y, has to be determined according to the proof 
of Theorem 3.1. Since k*(M) is isotone in M, the power function E[k*(M)|0] is by 
Theorem 3.7 isotone in 0. Hence, assertion (3) is true. 

Further we have 


maxE|k"(M)| 6] =<, 


and k(Y) = k*(M) in (3.24) is an UMP-a-test. Therefore, assertion (1) holds. 

If we put Mg = M1 _« for a fixed ae (0, 1) taking the (1 — a)-quantile M, _ , of 
the distribution of M for o, then we get 7, = 0 for 0 = PM = M, _ | ) (and, e.g. 
for all continuous distributions), which shows assertion (2) for this case. 
Otherwise we choose Mj analogously to (3.10) so that 


P(M <M¢| 0) <1-a<P(M<M%| 6p). 
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Put M,_,=Mf@ in (3.24) and determine y, analogously to (3.11) as 


- P(M<M,_4| 09) -(1-a) 
Ya" D(M=My_a|00) 


Hence, assertion (2) generally follows. 


Corollary 3.3. If the components of a random sample Y = (y1, y, ..-, 3)" sat- 
isfy a distribution of a one-parametric exponential family with @¢QC R! and 
if the natural parameter 7(@) is monotone increasing, then Hp: 0 <0) can be 
tested against H4:9=0, > 0, for each a e (0, 1) using an UMP-test k(Y) of 
the form (3.24). 

It is easy to see that the tests Hy: 8 < @ against H, : 0 = 04 > Op can be analo- 
gously designed for antitone 7(8). 


Example 3.6 Let the random variable y be B(n, p)-distributed. We want to test 
case A: Ho: p< po against Hy : p = pa, Pa > Po and case B: Ho: p = po against Hy: 
P = Pa, Pa< Po using a sample of size 1. Instead we can take Y = (y1, ya, 6.5 9n)"5 
where y; is distributed as B(1, p), because >/_ ,y; is sufficient. The distribution 
belongs for fixed 1 to a one-parametric exponential family with the natural 
parameter 


which is isotone in p. The likelihood function is 


n 
L(y,n) = (") eft(p)—nln(1-p) 


and we have T = M = y. According to Theorem 3.6, the random variable y has a 
monotone likelihood ratio. Therefore we have to choose k*(y) in case A 
according to (3.18) with y* from (3.15) (putting y* = y*) and in case B 
according to (3.17) with y; from (3.16) (putting y* = y’). These tests are 
UMP-tests for the corresponding a. 

If 0 is a vector and if we intend to test one-sided hypotheses relating to the 
components of this vector for unknown values of the remaining components, 
then UMP-tests exist only in exceptional cases. The same holds already in case 
6 € R' for simple null hypotheses and two-sided alternative hypotheses. In 
Section 3.3.2 the latter case is considered, while in Section 3.4 tests are devel- 
oped for multi-parametric distribution families. 

But there are UMP-tests for a composite alternative hypothesis and a two- 
sided null hypothesis, where two-sided has a special meaning here. This is 
shown in the next theorem. 
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Theorem 3.9 We consider for the parameter 0 of the distribution family 
P={Pp, 0€QCR'} the pair 


AHy:0<0, or 02> 65,0, < 0930), Ox€ Q 


Hy :0, <0 <69;0; < 00; 01, On€ Q 


of hypotheses. If P is an exponential family and if 0 is the natural parameter (until 
now 7), then also the distribution of the random sample Y = (y;, ya, -.-, 9)" 
belongs to an exponential family with the sufficient statistic T = T(Y) and the 
natural parameter @ = 7. Then the following statements hold: 


1) There is a uniformly most powerful a-test for {Ho, H,} of the form 


1 for Cia < T < C293 Cla < Coq 
AT) Hk(Y) =< 75 for T Gg) 1= 1,2 (3.25) 


0 else 
where Cj, and 7jz have to be chosen so that 
E{h(T)|0y] = E{h(T) [0] =a. (3.26) 


(Then we say that h(T) € K,.) 

2) The test h(T) from (1) has the property that E[h(T)| 0] for all 0 < 6, and @ > 04 
is minimal in the class K, of all tests fulfilling (3.26). 

3) For 0<a< 1 there is a point Op in the interval (@,, 02) so that the power func- 
tion (8) of k(Y) given in (1) takes its maximum at this point and is monotone 
decreasing in |9 - @|, provided that there is no pair {T), T>} fulfilling 


P(T =T;|0) + P(T = 7|0) =1 


for all Oe Q. 
The proof of this theorem can be found in the book of Lehmann (1959, 
pp. 102-103). It is based on Corollary 3.2 with m = 2. This theorem is hardly 
important in practical testing. 

If null and alternative hypotheses are exchanged in Theorem 3.9, or, more 
precisely, if we consider under the assumptions of Theorem 3.9 the pair 


Ho :01 <0 <09;01,0.€ 2c R' 
H4:0<0, or 0>05;0; <02301,02.€ QC R 


of hypotheses, there is no uniformly most powerful (UMP-) test, but a uniformly 
most powerful unbiased (UMPU-) test. This will be shown in the next 
Section 3.3.2. 
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3.3.2 UMPU-Tests for Two-Sided Alternative Hypotheses 
Let the assumptions of Theorem 3.9 hold for the just defined pair {Ho, H,} of 
hypotheses. We will show that 
1 for cig<T or T>Coq3 Cia < Cog 
A(T) =K(Y)=4 Vig for T= cigs i= 1,2 (3.27) 


0 else 


is a uniformly most powerful unbiased (UMPU)-test for this pair, if cj and yi, 
are chosen so that (3.26) holds. Since k(Y) is a bounded and measurable func- 
tion, E[K(Y) | 8] is continuous in @, and therefore differentiation and integration 
(related to expectation) with respect to 9 can be commuted. Regarding conti- 
nuity all assumptions of Lemma 3.1 in Section 3.1 are satisfied, where 
Q* = {01,2}. We have to maximise E[k(Y) | for all kK(Y) « K, and any 0’ outside 
of [6,,02] and to minimise E[A(Y)| 6] with A(Y) = 1 — k(Y) outside of [4;, 05], 
respectively, where /(Y) lies in the class K,_, of tests fulfilling 


E[A(¥)|01] = E[A(Y) [05] = 1- a. 


Theorem 3.9 implies that A(Y) has the form (3.25), and therefore k(Y) = 1 - A(Y) 
has the form (3.27), where all y;,, in (3.27) have to be put equal to 1 — yjq in (3.25). 

Consequently, the test (3.27) is a UMP-a-test in K,, and because of Lemma 3.1 
also a UMPU-a-test. These results are summarised in the next theorem. 


Theorem 3.10 If P= {P9,9¢QC R'} is an exponential family with the suffi- 
cient statistic T(Y) and k(Y) is a test of the form (3.27) for the pair 


Hy :0,<0<0o; 01 <0; 01,0.6€QC R 
H,:0<0, or 0>65; 01,0Q,.6Q2C R 


of hypotheses, then k(Y) is a UMPU-c-test. 

In the applications, a pair {Ho, H,} is often tested with the simple null 
hypothesis Ho: 8 = 9, and the alternative hypothesis H,:0 #4 0,. This case is 
now considered. 


Theorem 3.11 If under the assumptions of Theorem 3.10 the pair 
Hy :0=00,09€ QC R' 
Ha: 0409,09€QC R! 

of hypotheses is tested using k(Y) in the form (3.27), where all cj, and y/,, so that 
E|k(Y)| Oo] =a (3.28) 
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and 


E(T(Y)k(Y)| 0] = aE[T(Y)| 40] (3.29) 
hold, then k(Y) is a UMPU-a-test. 


Proof: The condition (3.28) ensures that k(Y) is an a-test. To get an unbiased 
test k(Y), the expectation E[k(Y)|6] has to be minimal at 0. Therefore 


DO) = SEK) = Fao! (Y)aPo 


necessarily has to be 0 at 6 = 90. oe L(Y, 0) = C(0)e°"h(Y) ~ L7(T, 0) b 
assumption, with the notation C’ = aC we get 


21r-S 


00 Lr(T,0) + TLr(T,6@). 


and therefore 


DatNO) ie) [K(Y)| 9] + E[T(Y)k(Y)| 6 

Regarding 
d C'(9) 
O= =, | dP» = C(0) | dPo + E|T(Y)| 4], 
{Y} {Y} 

it follows 

C'(9) 

C(0) = -E(T(Y)|6]. 


Because of (3.28) we get 
0= -aE|T(Y)|O0] + E[T(Y)k(Y)|o] 


by putting 6 = @) and therefore (3.29). This shows that (3.29) is true because the 
test is unbiased. 

Now let M be the set of the points {E[K(Y)| 9], E[T(Y)k(Y)| 90]} taking all crit- 
ical functions k(Y) on {Y}. Then M is convex and contains for 0 < z < 1 all points 
{z, ZE[T(Y)|@o]} as well as all points (a, x2) with x2 >a E[T(Y)| 9]. This follows 
because there are tests with E[K(Y)|9o] = a, where D(@) > 0. Analogously we get 
that M contains also points (a, x) with x; > aE[T(Y)| 90]. But this means that 
(a, @E[T(Y)| 90]) is an inner point of M. Taking Corollary 3.2, part (4), in 
Section 3.2, into account, there exist two constants k;, ky and a test k(Y) fulfilling 
(3.28) and (3.29) and supplying k(Y) = 1 iff 


C(O) (ki + kT) eT < C(@’) e? T 
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The T-values satisfying this inequality lie either below or above a real 
constant, respectively, or outside an interval [c1,, Cog]. However, the test can 
have neither the form (3.24) given in Theorem 3.8 for the isotone case nor 
the corresponding form for the antitone case, because the statement (3) of 
Theorem 3.8 contradicts (3.29). This shows that the UMPU-test has the 
form (3.27). 


Example 3.7 Let P be the family of Poisson distributions with Y = (y,, ..., Yn) 
and the likelihood function 
1 


LWA) = J [pel dora, 4 =0,1,2,.3 AeR* 
i=17"" 


with the natural parameter 0 =In/. We want to test the pair Hy:1=19, Hy: 
A#Aq of hypotheses. The likelihood function of the sufficient statistic 
T =)-}_ 19; is with 0=Ind 


1 
Lr(T,0) = mete 


It defines also a distribution from a one-parametric exponential family with 
6 =In (mA) and 


Ho :0=4 with 6 = In(7Ao); Hy, :OFO. 


Hence, all assumptions of Theorem 3.11 are fulfilled. Therefore (3.27) is a 
UMPU-e-test for {Ho, Ha}, if cig and y/,, (i =1, 2) are determined so that 
(3.28) and (3.29) hold. Considering 


TLr(T,A) =naLr(T-1,a) (T =1,2,...), 
E(T@) =’y =e™, 
and putting now w.l.o.g. 1 = 1, the simultaneous equations 
a=P(T <cyq\00) + P(T > Cra|00) + Yoh T (Cras 90) + VoqhT (Cra 90); 
a=P(T-1<cyq\90) + P(T-1> Cral90) + Vg@h 7 (Cia- 1,90) + Yoqh7 (Cra - 1, A) 
(3.30) 


have to be solved. 
The results of this example supply the following statements. 
Theorem 3.12 If y is distributed as P(A), then a UMPU-a-test for the pair 
Ho :A=20,Ha AFA, Ao € R* 


of hypotheses has the form (3.27), where constants c;. and y/,, are solutions of 
(3.30) with natural cj, and 0<y/,, <1. 
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Example 3.7 (continuation) 
It needs some time of calculation to find the constants c;, and y;,,. We give a 
numerical example to illustrate the solution procedure. We test Hy:4= 10 
against H,:A #10. The values of the probability function and the likelihood 
function can be determined by statistical software, for instance, by SPSS or R. 
We choose a = 0.1 and look for possible pairs (cy, C2). 

For c, = 4, co= 15, we obtain equations 


0.006206 = 0.018917y', + 0.034718y!, 
0.013773 = 0.007567y;, + 0.0520777!, 
from (3.30) supplying the improper solutions y, = —0.215, v4, = 0.296. The pairs 
(4, 16) and (5, 15) lead to improper values (y/, v5), too. Finally, we recognise 
that only the values c, = 5, cp = 16 andy, = 0.697, v5, = 0.799 solve the problem. 
Hence, (3.27) has the form 
1 for y<4ory>15 
0.697 for y=4 
k(y) = 
0.799 for y=15 


0 else 


and k(y) is the uniformly most powerful unbiased 0.1-test. 


Example 3.8 Let y be distributed as B(n, p). Knowing one observation y = Y, 
we want to test Ho: p = po against Ha: p £ Po» po €(0, 1) . The natural parameter 


is n= In, and y is sufficient with respect to the family of binomial distribu- 
—P 


tions. Therefore the UMPU-a-test is given by (3.27), where c;, and yj, (i = 1, 2) 
have to be determined from (3.28) and (3.29). With 


Ln(y|p) = ("eran 


Equation (3.28) has the form 


n 


1 
Ln(y|Po) + as Ln(y|Po) + Valen (C1alPo) + YraLn(C2a|po) = a. 


y=0 Y= Oa+1 


Cla — 


(3.31) 
Regarding 


yLn(y|p) =npLn-1(y-1|p) and E(y| po) = "Po, 
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the relation (3.29) leads to 


Cla — 


1 n 
Ln-1(y-1po)+ SY) Ln-1(y-1|po) + rg£n—1(C1a-1|Po) 
y=0 = Coq +1 


+Y be¢en-1(C2a- 1|po) = @. 


(3.32) 


The solution of these two simultaneous equations can be obtained by statistical 
software, for example, by R. Further results can be found in the book of Fleiss 
et al. (2003). 


Example 3.9 If Y= (yj, ...,¥,)' isa random sample with components distrib- 
1 
uted as N(0, 0”), then the natural parameter is 7 = - 702? and )-7_yy =T(Y)=T 


is sufficient with respect to the family of N(0, o”)-distributions. The variable 
1 T 

T is distributed with the density function =g, (=) where g,,(x) is the density 
oO oO 


function of a CS(n)-distribution. Therefore, the distribution family of T is a 
one-parametric exponential family (depending on 0”). Then 


1 for T <cyqo, or T > Coq 


0 else 


mr) =K={ 


is a UMPU-a-test for the hypotheses Ho:07=05 (0<oj<0o) against 
H,4: 0° #03, where constants cjq, i = 1,2 are non-negative and satisfy 


Cla 

| sn(e)dx- l-a (3.33) 

Cla 

Cla 

T 
xg,,(«)dx = (1-a)E 1 09| =n(1-a) 
0 

A symmetric formulation supplies under the conditions of the next corollary of 
Theorem 3.11 a UMPU-test (naturally these conditions are not fulfilled for 
Example 3.9). 


Corollary 3.4 Let the distribution of the sufficient statistic T = T(Y) be for 
0 = 0, symmetric with respect to a constant m, and let the assumptions 
of Theorem 3.11 be fulfilled) Then a UMPU-a-test is given by (3.27), 
Coq = 2m — Cie Ya = Fie = Ya and 

a 


P{T(Y) <cialO0} + 7eP{T(Y) = ralO0} = 5: (3.34) 


109 


110 | Mathematical Statistics 


Proof: Regarding 
P{T(Y) <m-x}=P{T(Y)>m+x}. 


Equation (3.28) is satisfied for x =m — c,q, where Cj, is the a/2-quantile of the 
T-distribution. Then we have 


E{T(Y)K(Y)|00} = E{(T(Y)-mlk(Y)|O0} + mE{K(Y) |0o}. 


Since the first summand on the right-hand side vanishes for a k(Y), which 
fulfils the assumptions above (symmetry), it follows mE{k(Y)|09} = ma and 
because of E{T(Y)|@o} = m also (3.29). 


Example 3.10 If Y=(y,y2, ...,¥,)’ is a random sample with components 
distributed as N(u, 0”) and o” is known, then T = T(Y) = 57/_,y; is sufficient 
2 


1 oO 
with respect to yw. The statistic poe y is distributed as N (u “), that is, it 


is symmetrically distributed with respect to yw. Regarding the hypotheses 
Ho: = Mo, Ha: ~ Mo, a UMPU-a-test is given by 


1 forz<z or z>Z,« 
k(Y)= 2 a (3.35) 
0 else 


if Zp is the P-quantile of the standard normal distribution and z= cea at) Jn. In 
oO 


the description of (3.27), we obtain 


oO oO 
Cla =Ho a and Coq = fg —Z2 = = Mo ee 


oO 
n n 


3.4 Tests for Multi-Parametric Distribution Families 


In several examples we supposed normal distributed components y;, where 
either 4 or o” were assumed to be known. However, in the applications in 
the most cases both parameters are unknown. When we test a hypothesis with 
respect to one parameter the other unknown parameter is a disturbance param- 
eter. Here we mainly describe a procedure for designing a-tests that are on the 
common boundary of the closed subsets of Q belonging to both hypotheses 
independent of a sufficient statistic with respect to the noisy parameters. 
At the end of this section, we briefly discuss a further possibility. Then we 
need the concept of a-similar tests and especially of a-similar tests on the 
common boundary * of @ and Q \ w given in Definition 3.2. We start with 
an example. 
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Example 3.11 Let the vector Y= (yj, yo, ..., J)’ be a random sample whose 
components are distributed as N(u, 0”). We test the null hypothesis Ho: 4 = Ho, 
against H4: # Mo for arbitrary o*. The statistic 


tu) => * Vn 


is a function of the sufficient statistic M = (y, sy", It is centrally distributed as 
t(n — 1). Here and in the following examples, 


1 n 
2 -\2 
Ss = — — 
ee (¥;-) 
is the sample variance, the unbiased estimator of the variance o*. The statistic 


(uo) is non-centrally distributed as t(n -1; oa vi). Therefore the test 
o 


1 for |¢(uo)| > ¢(n-1]1-<) 


0 else 


k(Y) = (3.36) 


Qa 
is an a-test, where t(n- 1 | 1- 5) is the a-quantile of the central t-distribution 


with 1 — 1 degrees of freedom. These quantiles are shown in Table D.3. Since Q* 
is in this case, the straight line in the positive (u, o”)-half-plane (o” > 0) defined 
by # = Mo and P{k(Y) = 1] po} = @ for all 6”, k(Y) is an a-similar test on Q*. 


3.4.1 General Theory 


Definition 3.7 We consider a random sample Y= (yj, y2, ...,¥,)’ from a 
family P = {P9,@ € Q} of distributions Py and write Qo = w and Q, = Q2\ao 
for the subsets in Q defined by the null and alternative hypothesis, respectively. 
The set Q* = Qy MQ, denotes the common boundary of the closed sets Qo and 
Q,. Let P* Cc P be the subfamily P* = {Py, 6 ¢ Q* C Q} on this common bound- 
ary. We assume that there is a (non-trivial) sufficient statistic T(Y) with respect 
to Q so that E[k(Y)|T(Y)] is independent of 6 ¢ Q*, that is, k(Y) is a-similar on 
Q* with 


a= E|k(Y)|T(Y), 02"). (3.37) 


A test k(Y) satisfying (3.37) is said to be an a-test with Neyman structure. 
Hence, tests with Neyman structure are always a-similar on Q*. Moreover 
they have the property that @ can be calculated by (3.37) as conditional expec- 
tation of the sufficient statistic T(Y) for the given value T(Y). Since the condi- 
tional expectation in (3.37) for each surface is defined by T(Y) = T(Y) = T 
independent of 0 € *, the tests of this section can be reduced for each single 
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T-value to such of preceding sections (provided that (3.37) holds). Therefore 
we will look for UMP-tests or UMPU-tests in the set of all tests with Neyman 
structure by trying to find a sufficient statistic with respect to P*. But first of all 
we want to know whether tests with Neyman structure exist. The next theorem 
states conditions for it. 


Theorem 3.13 If the statistic T(Y) with the notations of Definition 3.7 is 
sufficient with respect to P*, then a test k(Y) that is a-similar on the boundary 
has with probability 1 a Neyman structure with respect to T(Y) iff the family 
Pr of the distributions of T(Y) is bounded complete. 


Proof: 


a) Let Py be bounded complete, and let k*(Y) be a-similar on the boundary. 
Then the equation E[k(Y) — a|9eQ"] = 0 is fulfilled. Now consider 


d(Y) =k(Y)-a=E|k*(Y)-alT(Y), 0€Q"]. 


Since T(Y) is sufficient, we get E[d(Y) | P;] = 0. However, critical functions 
k(Y) are bounded by definition. Thus the assertion follows from the bounded 
completeness. 

b) If Pris not bounded complete, then there exist a function fand a real C > 0 so 
that |/[T(Y)]| < C and E{f[T(¥)]| 6 € Q*} = 0, but that f[T(Y)] 4 0 holds with 
positive probability for at least one element of Pr. Putting 4 min(a,1—a)=K, 
the function 

K(Y) =h|T(Y)] =Kf[T(Y)] + 
because of 0< k(Y) <1 for all Ye {Y} is a test and because of 
E|k(Y)|0€Q"] =K Ef{f[T(Y)]|@eQ"}+a=a 


a-similar on the boundary @”. But it holds k(Y) 4 @ for elements of Py with 
JT) £0. Therefore the test has no Neyman structure. 


Using Theorems 1.3 and 1.4 as well as Lemma 3.1, the problems of this section 
can be solved for k-parametric exponential families. Solutions can also be found 
for other distribution families. But we do not want to deal with them here. 


Theorem 3.14 Let us choose 6 = (A, 2, ..., 6,)", Ae R' in Definition 3.7 and 


consider each of the hypotheses Ho : A € Qo C R! and Hy :A¢ Qo CR! with arbi- 
trary 02, ... ,0,. Besides, let P be a k-parametric exponential family with 
natural parameters 7), ... ,4% where we put 7; =/ and T,(Y) =S(Y) =S. Then 


UMPU-a-tests exist for {Ho, Hy}, namely, we get for Qo= (— 00, Ao] the form 
1 for S>cq(T*) 
K(Y) =h(S|T*)= 4 yq(T*) for S=ca(T*), (3.38) 


0 else 
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for Qo = [Ao,00) the form 


1 for S <cq(T*) 
k(Y) =h(S|T*) = 4 y,(T*) for S=ce(T"), (3.39) 
0 else 


and for Qo = [A1, 42] the form 


1 for S<ciq(T*) or S>C2q(T*) 
k(Y) =A(S|T*) = § Yig(T*) (6= 1,2) for S=cig(T*) ; 
0 else 
(3.40) 


The constants in (3.38) and (3.39) have to determined so that 
E|h(S|T*)|T* = T*,0¢.Qp| =a 

for all T*. Further, the constants in (3.40) must fulfil the equation 
E|h(S|T*)|T* = T*,0¢.Qp| =a 


and, in the special case 2, = Az = Ag, both equations 
E|h(S|T*)|T* = T*, A =o] =a, 
E|Sh(S|T*)|T* = T*,A = Ao| = @E[S|T* = T*, A= Ao| 


with probability 1, respectively (analogous to (3.28) and (3.29)). 
Proof: Regarding the null hypothesis, we have 

QD = {0:2 =o; No)... 4, arbitrary } 
if Qo = (- 00, Ag], Qo = (Ap, 00) or Og= {Ag} and 

Q* = {O:A=A, or A= Ag Ny,..., Np arbitrary } 


if Qo= [A1, 2], 41 #A2. Because of Theorems 1.3 and 1.4, T is complete (and 
therefore bounded complete) as well as sufficient with respect to P and 
therefore also with respect to P*. The conditional distribution of S for 
T* = T* belongs to a one-parametric exponential family with the parameter 
space QNR! =Q. In the case of one-sided hypotheses, the test k(Y) in (3.38) 
and (3.39), respectively, designed analogously to (3.24), is a UMP-a-test for 
known 772, ... ;4x by Corollary 3.3 of Theorem 3.8 for suitable choice of the con- 
stants. Taking the sufficiency of T(Y) into account, these constants can be deter- 
mined independent of 72, ... , 4%. Hence, k(Y) in (3.38) and (3.39), respectively, 
have Neyman structure by Theorem 3.14. Therefore, both are UMPU-a-tests 
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because of Lemma 3.1. The assertion for the two-sided case follows analogously 
by applying Theorems 3.10 and 3.11. 


x 
Example 3.12 Let ( be bivariate distributed with independent compo- 
y 


nents. Let the marginal distributions of x be P(A,) and of y be P(A,), respectively 
(0 <A,, Ay < co). We want to test Ho: A, = A, against Hy : A, # Ay. We take a sample 
of size m and put T = x + y. The conditional distribution of « for T* = T* is a 
Ax 
B(T", p)-distribution with p= ek where T* is distributed as P(A, + ,). 
at Ay 
Hence, the probability function of the two-dimensional random variable 
(x, T*) is 


ce een . 
P(x, T* | 0,72) = ( ere" —Ay Ay 
x ! 


Ax 
which has the form of an exponential family for @= In qt R= In A, and 
y 
A(n) =e"? (1 + e®). Now the pair {Ho, H4} can be written as Ho: 6 = 0, 7 arbitrary 
and H4:0#0, 2 arbitrary. Therefore the optimal UMPU-ca-test for {H,, H4} 
has the form (3.40). 
Assume Hp (i.e. for p = %) the conditional distribution of « under the condi- 


1 
tion T* = T* is symmetric with respect to 5 T’. By Corollary 3.4 of Theorem 3.11, 


the constants in (3.28) and (3.29), respectively, have to be calculated from 


Cia(T*) = Ca, C2q(T*) = T* -Cy 


a 4 1 
gE Nea) eR 5 (3.41) 


1 
(ce os 5) 


where c, is the largest integer for which the distribution function 
F (%a| T*,p=4) of the B(T*,4)-distribution is less or equal to %. Further, 
P(%q| T*, p= 4) is the probability function of the B(T*, 5 )-distribution. 


The results of the example imply the following statements. 


Vial T") =Yoq(T") = 


Theorem 3.15 If « and y are independent from each other distributed as 
P(A,) and P(A,), respectively, and if Ho: 4, = A, is tested against H4: A, # A,, then 
a UMPU-a-test is given by (3.40), where the constants are determined with 
the notations of Example 3.12 by (3.41). 
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The following theorem allows the simple construction of further tests. The 
present theory does not supply the t-test of (3.36) in Example 3.11, which is 
often used in applications. 


Theorem 3.16 Let the assumptions of Theorem 3.14 be fulfilled. If moreover 
there exists a function g(S,T"), which is isotone in S for all T*, and if g = g(S,7") is 
under Ho independent of 7*, then the statements of Theorem 3.14 hold 
for the tests 


1 for g>Cq 
k(Y)=r(g)=¢ Yq for g=Ca, (3.42) 
0 else 


in the case Qo = (- 00, Ag], 


1 for g<Cq 
kK(Y)=r(g)=4 Yq for g=Ca (3.43) 
0 else 


in the case Qo = [do,00), and 


1 for g<Ciq Org > Coq 
K(Y) =1(@) = 4 Vig for g = Cig (i= 1,2) (3.44) 


0 else 


in the case Qo = [1,9], if c, and y, are determined in (3.42) and (3.43), respec- 
tively, that k(Y) is an a-test and for (3.44) conditions analogous to both of the 
last equations of Theorem 3.14 are fulfilled. 


Proof: The prescriptions for determining the constants imply E[r(g)|Ho] = a, 
that is, for instance, in the case of the test in (3.42) 


P(E >Ca) + YaP(¥ = Ca) =a. 


Since g is independent of T* for 1 = Ay, cz, and y, are independent of T*. 

Further, since g(S, T*) is isotone in S for each 7“, the tests in (3.42) and in 
(3.38) as well as analogously the tests in (3.43) and in (3.39) are equivalent 
(ie. there rejection regions in the sample space {Y} are identical). The same con- 
clusion can be made in the two-sided case with respect to the tests in (3.44) and 
(3.40) if only the last equations of Theorem 3.14 are replaced by the equivalent 
conditions 


E[r(g)| T°, Ao] =a, 
Elgr(g)| T°, Ao] = @Elg| T*, Ao). 
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We will use this theorem for showing that the t-test in Example 3.11 is a 
UMPU-test. 


Example 3.11 (continuation) 


We know from Chapter 1 that (S7"_,y, 32” ,y?)’ =T is minimal sufficient 
with respect to the family of N(u, o”)-distributions. With the notations of 
Theorem 3.14, we put 


S=y-7 9 and T* = 2 


where T* is complete sufficient with respect to P* (e.g. with respect to the family 
of N(4o, o°)-distributions). Now we consider 


t=g=2(S, T*) = Vn (So) a Y-Uo Jn. (3.45) 
—L (T* -ns’) Ss 


We know that g is for = fo distributed as t(n — 1) and to be precise independent 
of o* € R*. However, for known p = ly the statistic 4,T* is distributed as CS(n). 
Therefore Theorem 1.5 implies that g and T* are independent for all 6 € Q (ie. 
for # = Mo). Thus the assumptions of Theorem 3.16 are fulfilled because g is 
isotone in S for each T*. Consequently the t-test is a UMPU-a-test. 

This leads to the test of W.S. Gosset published in 1908 under the pseudonym 
Student (Student, 1908). 


Theorem 3.17 Student (1908) 

If n > 1 components of a random sample Y= (yj, y2, ..., Jn)’ are distributed 
as N(u,07), then the so-called t-test (Student’s test) for testing Ho: = Uo, 0” 
arbitrary, of the form 


k(Y) = 


, 


f for t > t(n-1|1-a) 


0 else 


for H4: > Uo, 0° arbitrary, of the form 


k(Y) = 


, 


1 fort< —t(m-1|1-a) 
0 else 


for H4: 1 < Mo, o arbitrary, of the form 
1 for|t|> t(n- 11-5) 


k(Y)= 
0 else 
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and for H4: 1 flo, 0” arbitrary, respectively, is a UMPU-a-test, where ¢(n — 1|P) 
is the P-quantile of the central t-distribution with n — 1 degrees of freedom. 

First we will show how the sample size can be determined appropriately 
corresponding to Example 3.1. We want to calculate the sample size so that 
for given risks of the first and the second kind, a fixed difference of practical 
relevance related to the value of the null hypothesis can be recognised. We 
suppose that for each n > 1 Y=(y1,y2, ...,J,)’ is a random sample with com- 
ponents distributed as N(u, 0°). 

We test the null hypothesis Ho: = fo, o° arbitrary, against one of the 
alternatives: 


a) H4: [> Uo, 0” arbitrary, 
b) Hai" <[lo, 0” arbitrary, 
c) Ha: # Uo, 0 arbitrary. 


The test statistic 


Y—H 
tu) = "Vn 
in (3.45) is under Hp centrally t-distributed; in general it is non-centrally 
t-distributed with the non-centrality parameter 4 = a 5 
oO 


Actually, each difference of the parameters under the null hypothesis (wo) on 
the one hand and under the alternative hypothesis (uw) on the other hand can 
become significant as soon as the sample size is large enough. Hence, a significant 
result alone is not yet meaningful. Basically it expresses nothing, because the dif- 
ference could also be very small, for instance, |“ - “| = 0.00001. Therefore, 
investigations have to be planned by fixing the difference to the parameter value 
of the null hypothesis (9) to be practically relevant. For explaining the risk 6 of 
the second kind, we pretended the alternative hypothesis would consist only of 
one single value y,. But in most applications , can take all values apart from 
Ho for two-sided test problems and all values smaller than or larger than jo for 
one-sided test problems. The matter is that each value of ~, causes another 
value for the risk 6 of the second kind. More precisely the smaller / is, the 
larger the difference 4, — Wo . The quantity E = (uw; — fo)/o, that is, the relative 
or standardised practically relevant difference, is called (relative) effect size. 

Therefore the fixing of the practically interesting minimal difference 6 = 1, — Uo 
is an essential step for planning an investigation. Namely, if 6 is determined and if 
certain risks a of the first kind and # of the second kind are chosen, then the nec- 
essary sample size can be calculated. The fixing of a, and 6 is called the precision 
requirement. The crucial point is that differences jz; — #o equal or larger than the 
prescribed 6 should not be overlooked insofar as it is possible. To say it more 
precisely, it is to happen only with a probability less or equal to # that such dif- 
ferences are not recognised. 


an 
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The sample size that fulfils the posed precision requirement can be obtained 
by the power function of the test. This function states the power for given sam- 
ple size and for all possible values of 6, that is, the probability for rejecting the 
null hypothesis if indeed the alternative hypothesis holds. If the null hypothesis 
is true, the power function has the value a. It would not be fair to compare the 
power of a test with a = 0.01 with that of a test with a = 0.05, because a larger a 
also means that the power is larger for all arguments referring to the alternative 
hypothesis. Hence only tests with the same a are compared with each other. 

For calculating the required sample size, we first look for all power 
functions related to all possible sample sizes that have the probability a for 
Ho, that is, the parameter value under the null hypothesis. Now we look up 
the point of the minimum difference 6. Then we choose under all power func- 
tions the one that has the probability 1 — / at this point, that is, the probability 
for the justified rejection of the null hypothesis; hence, at this point the prob- 
ability of unjustified accepting, that is, of making an error of the second kind, 
is f. Finally, we have to choose the size 1 corresponding to this power 
function. For two-sided test problems, the points —6 and +6 have to be fixed. 
Figure 3.2 illustrates that deviations larger than 6 are overlooked with still 
lower probability. A practical method is as follows: divide the expected range 
of the investigated character, that is, the difference between the imaginably 
maximal and minimal realisation of the character, by 6 (assuming a normal dis- 
tribution approximately 99% of the realisations lie between fig — 30 and flo + 30) 
and use the result as estimation for o. 

For unknown variance o” we can use the sample variance of a prior sample of 
size 1 between 10 and 30. 


Delta=1.5 


Figure 3.2 The power functions of the t-test testing the null hypothesis Ho : ¢ =o against Hy: 
LH # Mo for a risk a = 0.05 of the first kind and a sample size n = 5 (bold-plotted curve below) 
as well as other values of n (broken-lined curves) up to n = 20 (bold-plotted curve above). 
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For example, assuming a power of 0.9, the relative effect can be read on the 
abscissa, and it is approximately 1.5 for n = 7. 


Hints Referring to the Statistical Software Package R 

In practical investigations professional statistical software is used to determine 
appropriate sample sizes for given values of a, # and 6; in this book we apply 
mainly R. The software package R is an adaptation of the programming language 
S, which has been developed since 1976 by John Chambers and colleagues in the 
Bell Laboratories. The functionality of R can be extended by everybody without 
any restrictions using free software tools; moreover it is possible to implement 
also special statistical methods as well as certain procedures of C and FOR- 
TRAN. Such tools are offered in the Internet in standardised archives. The most 
popular archive is probably CRAN (Comprehensive R Archive Network), a server 
net that is supervised by the R Development Core Team. This net also offers the 
package OPDOE (Optimal Design of Experiments), which was thoroughly 
described in the book of Rasch et al. (2011b). 

Apart from only a few exceptions, R contains implementations for all statis- 
tical methods concerning analysis, evaluation and planning. 

The software package R is available free of charge under http://cran.r-project.org/ 
for the operating systems Linux, Mac OS X and Windows. The installation under 
Microsoft Windows takes place via ‘Windows’. Choosing their ‘base’, the installa- 
tion platform is reached. With ‘Download R 2.X.X for Windows’ (X stands for the 
required version number), the setup file can be downloaded. After this file is started, 
the setup assistant is running through the single installation steps. Concerning this 
book, all standard settings can be adopted. The interested reader will find more 
information about R under http://www.r-project.org/ 

After starting R the input window will be opened presenting the red-coloured 
input request: ‘>’. Here commands can be written up and carried out by the 
enter button. The output is given directly below the command line. But the user 
can also realise line changes as well as line indents for increasing clarity. All this 
does not influence the functional procedure. Needing a line change the next line 
has to be continued with ‘+’. A sequence of commands is read, for instance, as 
follows: 


> cbind(ul_t1l.tab, ul_tl.pro, ul_t1.cum) 


The Workspace is a special working environment in R. There certain objects 
can be stored, which were obtained during the current work with R. Such objects 
contain not only results of computations but also data sets. A Workspace is 
loaded using the menu 


File - Load Workspace... 
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Now we turn to the calculation of sample sizes. We describe the procedure for 
calculations by hand and list a corresponding file in R. 
The test statistic (3.45) is non-centrally t-distributed with m — 1 degrees of 


freedom and the non-centrality parameter A = say /n. Under the null hypoth- 
oO 


esis =o is A = 0. Taking the (1-a)-quantile t(7-1]| 1- a) of the central 
t-distribution with n — 1 degrees of freedom and the /-quantile of the corre- 
sponding non-central t-distribution t(m — 1, | /), we obtain in the one-sided 
case the condition 
t(n-1|1-a@) =t(n-1,A|f) 

because of the requirement 1 - a(u) = P(t< t(n-1, 4 | 1-a)) =f. This means 
that the (1 - a)-quantile of the central t-distribution (the distribution under 
the null hypothesis) has to be equal to the /-quantile of the non-central 
t-distribution with non-centrality parameter 2, where 1 depends on the mini- 
mum difference 6. We illustrate these facts by Figure 3.3. 

We apply an approximation that is sufficiently precise for the calculation of 
sample sizes by hand, namely, 


o Mi. 


t(n-1,A|B)xt(n-1|P) +A = -t(n-1]1 p)+~ 


Analogous to Example 3.1 the minimum sample size 7 is therefore obtained by 
2 o 
n= |(t(n-1|l-a)+t(n-1|1-f)| rab 


where [x] again denotes the round-off function. 
After fixing a, 6, 6 and o, the sample size n can be iteratively calculated by 
this formula. Now we put 6 = o, that is, deviations of at least the standard 


t-Quantile 


Figure 3.3 Graphical illustration of the risks @ and . 
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deviation are to be overlooked at most with the probability #. For a = 0.05 and 
f = 0.2, we start iterations with 1) = co and get 


t(00 |0.95) = 1.6449, t( co |0.8) = 0.8416, 
followed by 

na) = | [1.6449 + 0.8416]”] = [6.18] =7; 

t(6|0.95) = 1.9432, t(6|0.8) = 0.957; 

Ny = [ [1.9432 + 0.9057]*] = [8.11] =9 

t(8|0.95) = 1.8595, t(8|0.8) = 0.8889 

Ng) = [ [1.9432 + 0.9057]*] = [7.56] =8 

t(7|0.95) = 1.8946, t(7|0.8) = 0.896; 

na) = [[1.8946 + 0.896]”] = [7.78] =8. 


Hence, v = 8 is the minimum sample size. In the case of a two-sided alternative, 
we calculate 1 = 10 using R (see Table 3.2). Here 1 — a has to be replaced in the 
t-quantile by 1 — a@/2. 

Table 3.2 lists the sample sizes in the just considered case for a two-sided 
alternative with a = 0.05, 6 = 0.2 and some 6 computed with the software pack- 
age OPDOE (according to the exact formula). The extract of commands is 


>size.t.test (delta=1, sd=1, sig. level=0.05, power = 


+0.8, +type:"one.sample", alternative = "two.sided") 


where sd = o, sig.level = a and power = 1 — £. Remember that a new command 
line needs the sign ‘+’ at the beginning. 

Exploiting the previous results the reader can prove for many of the custom- 
ary tests used in applications that they are UMPU-c-tests. 


Example 3.13 Supposing the conditions of Example 3.11, we want to test 


Ho : 0? = 0%, w arbitrary, against a one- or two-sided alternative hypothesis. Here 
we restrict ourselves to the alternative Hy : 0” 403, p arbitrary. Put 


1 n Z 9. aie 
d= — so? a=? Sa 9e8 =y 


Table 3.2 Values of n depending on 6 = c-o for a = 0.05, f = 0.20 and a two-sided alternative 
(ie. P = 1 - a/2). 


6 1/250 1/10 6 1/50 1/40 1/36 1/20 lo 
n 4908 787 199 128 73 34 10 
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and 


1 n 


2 
0 i=1 


ok 1 * = 
g=9(S,T")=—[S-nT”] = (y;-9)"- 
0 
The function g is for each T* isotone in S. Besides ¥ is complete sufficient with 
respect to the family N(y,09) (i.e. with respect to P*). Since g is distributed as 
CS(n - 1), the mapping 


1 for g<Cig Or > Coa 
~ )0 else 


is a UMPU-a-test if cy, and c2_ are determined according to (3.32) and (3.33) 
with 1 — 1 instead of n. 

Now we discuss whether it is always favourable to look for UMPU-tests. The 
exclusion of noisy factors 73, ..., N% as it was described in this section, is only one 
of several possibilities. We can also design tests so that the condition 


Wes E|k(Y)|@] =a 


is fulfilled. We want to consider both possibilities in the following case. Let the 
random variables x and y be mutually independently distributed as B(1, p,) and 
B(1, py) correspondingly (satisfying each a two-point distribution), where 


P(x =0) =p,,P(«=1)=1-p,,0<py <1, 
P(y=0) =p,,P(y=1) =1-p,,0 <p, <1. 


Further, the null hypothesis Ho: p, = py = p, p arbitrary in {0, 1}, is to be tested in 
Q* = (0, 1) against Hy: px < Py; Px» Py arbitrary in {0, 1} with a risk a (0 < a < 0.25) 
of the first kind. The set of possible realisations of (x, y) is 


{Y}={(«%y):x=0,1,y=0,1}. 


The boundary * is the set of possible p-values, namely, * = {0, 1}, the diagonal 
in the discrete square {0, 1} x {0, 1}. 

First we design a test that fulfils the above given maximum condition. Because 
of Q, = Q*, the condition 


has to hold, which supplies for a < 0.25 (with c, = 0) 


a(Y)= for (x,y) = (0,1) 


, 


0 else 
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taking 


E[ki(Y)| =4aP(«=0,y = 1) =4ap,(1-p,). 


into account. Under the null hypothesis the expectation is equal to 4 ap(l - p). 
This functional expression takes its maximum a for p = 1/2. Therefore k,(Y) is 
an atest. 

Now we design a UMPU-a-test. Evidently we have 


Pay =P(#=%,=J) = ( )est-pn)' ("Jpg (ap) 


= Pip}(1-Ps)' *(1-py)'”. 


Putting T* = x + y, S = x, we see from (3.37) analogous to Example 3.12 that 


P(xex, TST" ps=Py) = > 


is true under Hp. Then 


2a for x=0,y=1 
kK(Y)=4a_ forx=y 
0 forx=1,y=0 


is with cy = 0 the realisation of a UMPU-a-test, since 


E[ko(Y)] = 2ap,(1-px) + a|PxPy +(1-p,) (1 -py)| +0= a(1 +Py =Px) 
which is equal to a under the null hypothesis. If the power functions 71(p.; Py) 
and 2(p,, py) of both tests, namely, 

1 (Dus Py) = 4apx(1—-py); 22 (Pur Py) =a(1-Px + Py)s 
are compared, then we get (here ‘more powerful’ means a larger power) 

ky(Y) is more powerful than k; (Y), if 4p.(1-py) >1+py-pxs 

k,(Y) is biased, if 4p,(1-py) <1. 

The parameter space is determined by p, < py. It is easy to see that the biased test 
k,(Y) is in a considerable part of the parameter space more powerful than the 
unbiased test k2(Y). If an a priori information is available that the differences 


between p, and py are rather great or that only rather great differences are of 
interest, then k,(Y) should be preferred to k,(Y). 
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3.4.2 The Two-Sample Problem: Properties of Various Tests 
and Robustness 


The following examples for UMPU-a-tests are of such great practical impor- 
tance that we dedicate an entire section to them. Moreover, as representatives 
of all test problems in this chapter, these tests are to be compared with tests 
not belonging to the UMPU-class where also consequences of violated or 
modified assumptions concerning the underlying distributions are also pointed 


; ; is 
out. We consider two independent random samples Y1 = (y11)-in) > 


Y= Giaseoee) 3 where components yj, are supposed to be distributed as 
N(u,07). We intend to test the null hypothesis 


Ho: fy = My =H, 07,0, arbitrary 
against 
Hg: fy, F Ma, 04,0, arbitrary. 


The UMPU-c-tests for one-sided alternatives with o7 = 05 can be designed anal- 
ogously. This work is left to the reader. 
The second class of tests we consider concerns the pair 


Ho: 07 = 03 = 67, My, fly arbitrary 
Hy: 0; #03541, arbitrary. 


of hypotheses. Since we use two random samples belonging to different distri- 
butions, it is called a two-sample problem. Regarding each pair (ij), 1 <i <1, 


1 <j < np the vector variable @ belongs to a two-dimensional (or bivariate) 


2j 
Hy 


normal distribution with the expectation vector ( 
M2 


) and the covariance 


2 
o; O : ; 
matrix ( ke ,) representing a four-parametric exponential family. Therefore, 
©% 


Y 
the random vector Y = ( ‘) has also a distribution from a four-parametric 


2 
exponential family with the natural parameters 


Nk Ug 1 1 
7 k =1,2); 7 ; = 
Mk ox ( ) 3 2 oF "4 2: 0} 


and the complete sufficient statistics 
ny Ny 
Ti(Y) =¥; (i= 1,2); T3(Y) = Some T,(Y) = 95; 
é I 


i=1 
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3.4.2.1 Comparison of Two Expectations 
Considering the pair of hypotheses with respect to the expectations, we cannot 
design a UMPU-c-test in general. We are only successful for the special case 


0; = 03 =o" (variance homogeneity). 


3.4.2.1.1 A UMPU-a-Test for Normal Distributions in the Case of Variance 
Homogeneity 
We want to design a test for the pair 


Ho: Hy = My =, 07 = 03 = 0” arbitrary 


Ha: fy 4 fy, OF = 05 = 0 arbitrary 


Y 
of hypotheses. Then the common distribution of a random variable Y = & 
2 


is an element of a three-parametric exponential family, which can be written 
with the natural parameters 


Wied Hi h2 n _ Miki + HoH ee 1 
: eee 2 * (m +m)02’ “F206? 
Ny nN2 


and the corresponding statistics 


ny ny 
S=9, -Jo3 T} = MY, + MY; T3 = Soni + S95): 
ian j=l 


Besides, (T7,7'3) =7* is complete sufficient with respect to P* (ie. for the 
case fl; = fl = 0, where P* is a two-parametric exponential family). According 
to Theorem 3.14, there is a UMPU-a-test for our problem. We consider 


‘ s Misy: 
g=g(S,T")= 1 MN Sete 
(t tT Ss? 
Ny, +Nno Ny +N 
(3.46) 
=Oi-F2) 
gui 92 


with 


ao a OuR-niy - Y21 0% -Ia)” 
Ny +Ny-2 : 
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The distribution of g under Hp depends neither on the value w = py, =o nor 
nN, +N 


on o”, since the nominator of g is distributed as N (0 ) and the square 


N\NnN2 
of the denominator independent of it distributed as CS(m, + n2 — 2) referring to 
the first quotient representation in (3.46). Hence, by Theorem 1.5 the random 
variable g is independent of T. The test statistic 


pao 21 92 | mm (3.47) 
Ss Ny + nog 


is distributed as e|m + Ny —-2; Piv# 2, | bane ; Therefore the UMPU-a-test 
Ny + ng 


o 
for Hp against H, given in (3.47) has the form 


1 for ie|>e(m +n 21 =) 
k(Y) = ene 2: 


0 else 


This test is called the two-sample-t-test. 


Example 3.14 (Optimal Sample Size) 
We want to calculate the optimal (i.e. the minimal) total size of both samples 
so that the precision requirements a = 0.05, 6 = 0.1 and o=6= py, - fl hold. 


ny\n2 


For given total size N = n, + m2 the factor in (3.47) becomes maximal 


ny +NnN2 
if 1, = Ny =n. We take this choice. Observing the mentioned precision require- 
ments, the non-centrality parameter of the t-distribution is 


Gee cilia ary ec _6 |n 
o m+n, oV2 


Analogous to the one-sample case, the condition 


t[20—1) 219] = [2m 12 


has to be realised. Using OPDOE in CRAN - R, the size 1 of a random sample 
can be determined. Again we choose for one-sided alternatives P = 1 — a and 
for two-sided alternatives P = 1 - a/2. 

The commands in R have to be modified only slightly compared with the 
one-sample problem as you can see below: 


>size.t.test (delta=1, sd=1, sig.level=0.05, power =0.8, 


+type="two.sample", alternative = "two.sided") 
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If the calculation is made by hand, we can again use the formula 


n= ec20n—1))P) + e2m- 11-8)? 


obtained by approximation. 


Warning: It should be explicitly mentioned here that the two-sample t-test 
is not really suitable for practical applications. This is the consequence of an 
article published by Rasch et al. (2011a). Some comments can also be found 
at the end of this section concerning robustness. We urgently recommend using 
the Welch test instead of the two-sample-t-test. This test is described now. 


3.4.2.1.2 Welch Test 

We previously assumed that the unknown variances of the populations from 
which both samples are taken are equal. But often this is not fulfilled or not 
reliably known. Then we advise for practical purposes applying an approximate 
t-test, namely, a test whose test statistic is nearly t-distributed. Such a test is suf- 
ficiently precise concerning practical investigations. Moreover, it is a so-called 
conservative test — meaning a test guaranteeing a risk of the first kind not larger 
than the prescribed a. 

The distribution of the test statistic 


V1 —J2— (M1 42) 1< = 
-i=ee ss Si = = S> n Va)» k=1,2 


t* 


Ny 1/5) 
for unknown variances was derived by Welch (1947). The result is given in the 


next theorem. 


Theorem 3.18 (Welch) 
Let Y¥i= (--s0im) * Y= (Yo1+-++s Yon) "be two independent random 
samples with components y; distributed as N (14,07 ). Introducing the notations 


2 2 
al (m-1)-4 
ny OT 
ia Oo, On a St s 
a= ny-1 + (M)-1 
Ny 1/5) ( : ) ot ( ) 03 
and 
: brta-by?, 
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m-1 nm- 
with values a( o ; : 


1 
) of the beta function, the distribution function of 


f in the case 41 = fz is given by 


F(t*)= [in oma Wi + No 2 rf DC) (bya, 


Ny-1 


where H,,, 4n,-2 is the distribution function of the central ¢-distribution with 
Nn, + Nz — 2 degrees of freedom. 

The proof of the theorem is contained, for example, in Welch (1947) or in 
Trickett and Welch (1954). The critical value ¢; can only be iteratively deter- 
mined. An iterative method is presented in Trickett and Welch (1954), Trickett 
et al. (1956) and Pearson and Hartley (1970). Tables listing critical values are 
given in Aspin (1949). 

If the pair 


Ho: fy = My =H, 67,65 arbitrary 
Ha: fy # My, 07,05 arbitrary 


of hypotheses is to be tested often, the approximate test statistic 


t Ni ub) 
si 8 
ny Ng 


is used. Ho is rejected if |¢*| is larger than the corresponding quantile of the 
central t-distribution with 


a: 5 
n\?(m,-1) Ny? (nN2-1) 


degrees of freedom. 


Example 3.14 Optimum Sample Size (continuation) 

We want to determine the size of both samples so that the precision require- 
ments @ = 0.05; # = 0.1; 6, = Co, with known C and 6 = py — Hz = 0.90, are ful- 
filled. Using these data the software package OPDOE in CRAN -— R is ready for 
calculating the sizes of both samples. As before we have to put P = 1 — a for one- 
sided alternatives and P = 1 — a/2 for two-sided alternatives. The sequence of 
commands in R has to be only slightly changed compared with the one for 
the t-test. 
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oO 

Concerning calculations by hand, we use ny=—n, and again the 
Ox 

approximation 


2 Ox (6x pL oy) 


nate(e(f"|P) + 1(f"|1-p)P 


The data in this example supply the sizes n,=105 and n,=27 for 
0, = 40,(C = 4). 


Hints for Program Packages 
At this point, we give an introduction to the package IBM SPSS 24: Statistics 
(SPSS in short). When we open the package, we find a data matrix (which is 
empty at the beginning) into which we will put our data. 

Clicking on variables you can give names for the characters you wish to enter 
as shown in Figure 3.4. 


ED “Untitled [DataSet0] - IBM SPSS Statistics Data Editor - oO x 


File Edit View Data Transform Analyze Graphs Utilities Extensions Window Help 


SuéM@- > BIB h BY B28 Oe 


Type Width Decimals | Label | Values | Missing | Columns Align Measure 
1 Numeric 8 1 None None 8 = Right Unknown 
2 Numeric 8 1 None None 8 3 Right Unknown 
3 
4 
5 
=a 
ar 
8 
o 
10 
ial 
12 
13 
14 
15 
16 
7 | 
8 
19 
= 
21 
22 
eas 
24 
— ric i | 


IBM SPSS Statistics Processor is ready ‘al Unicode:ON | 


Figure 3.4 SPSS data file in variable view. Source: Reproduced with permission of IBM. 
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Table 3.3 The litter weights of mice (in g) of 


Example 3.15. 

! Xj Ji 
1 7.6 7.8 
2 13.2 11.1 
3 9.1 16.4 
4 10.6 13.7 
5 8.7 10.7 
6 10.6 12.3 
7 6.8 14.0 
8 9.9 11.9 
9 7.3 8.8 

10 10.4. 7.7 

11 13.3 8.9 

12 10.0 16.4 

13 9.5 10.2 


Let us consider an example. 


Example 3.15 Two independent samples of 13 mice are drawn from two 
mouse populations (26 mice in all). The x- and y-values are the litter weights 
of the first litters of mice in populations 1 (x;) and 2 (y,), respectively, and given 
in Table 3.3. 

We will now create this data as an SPSS data file. First we need to rename 
var in the first column as x and var in the second column as y as already done 
in Figure 3.5. Then we need three digits in each column and one decimal place. 
To do these we change from Data View to Variable View (see Figure 3.5 below 
left). Now we can change the variable names to x and y and the number of 
decimal places to 1. Having returned to Data View, we now enter the data 
values. We save the file under the name mice-data.sav. The SPSS file is shown 
in Figure 3.5. 

SPSS allows us at first to calculate some descriptive statistics from the 
observations via 


Analyze 
Descriptive statistics 
Descriptive 


and then we choose Options as shown in Figure 3.6 
Here we select what we like and receive the output in Figure 3.7. 
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Litterweights - SPSS Data Editor 


File Edt View Data Transform Analyze Graphs Ulilties Window Help 


ses) Solo) eb] A) Fle} Bal) Slo! 


Figure 3.5 SPSS data file for Example 3.15. Source: Reproduced with permission of IBM. 


Figure 3.6 Options in descriptive statistics. 


cons i x 
Source: Reproduced with permission of IBM. @ Descriptives: Options 
Mi Mean sum 
r Dispersion 


(M Std. deviation (M Minimum 
(M Variance Maximum 
‘| SE. mean 


ey 


y Distribution 
{-] Kurtosis Skewness 


r Display Order -———_ 
@ Variable list 

© Alphabetic 

© Ascending means 
© Descending means 
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{tea Output! [Document!] - IBM SPSS Statistics Viewer = go x 
Bile Edit Yew Dats Transform Insert Format Analyze Graphs Utilities | Extensions Window Help 


t ae DESCRIPTIVES VARIABLES=x y 
&  & Descnptives /STATISTICS=MEAN STDDEV VARIANCE RANGE MIN MAX. 
{Tw 
Notes 
+L) Active Dataset Descriptives 
+ (@ Descriptive Statistics |] 
[DataSet1] C:\Users\sony\Documents\Eigene Dateien\Buch Statistik\mice-data.sav 
Descriptive Statistics 
Range Minimum Maximum Mean — St. Deviation Varianca 
x 65 68 133 9.769 1.9868 3.947 
87 77 16.4 11.531 2.9519 Bre 
Valid N (listwise) 
_ a * 


[ |IBM SPSS Statistics Processor is ready | | Unicode.ON _H: 3.39, W: 16.62. cm 


Figure 3.7 SPSS output of Example 3.15. Source: Reproduced with permission of IBM. 


To test the hypothesis, that the expectation of both variables are equal against 
a two-sided alternative, we first have to rearrange the data in one column now 
proceed named ‘weight’ and a second column group where we put a ‘1’ for the 
first 13 values and a ‘2’ for the others as done in Figure 3.8. 


|  "mice-datasav [DataSet1] - IBM SPSS Statistics Data Editor - o x 
File Edit View Data Transform Analyze Graphs Utilities Extensions Window Help 


SHe ne ca Bb h Se hoe Hoe ~% 


ios: groep Visible: 2 of 2 Variables 
ae ver] var ver ve Se var | var var 
5 87 1 ry 
6 106 1 
Taal 68 1 
=a8 99 1 
9 73 1 
Eenoney 104 1 
a 133 1 
12 10.0 1 
3 95 1 
4 | 78 2 
45 14 2 
16 164 2 
i 137 2 
18 10.7 2 
19 123 2 
20 14.0 2 
21 11.9 2 
22 88 2 
ea 77 2 
= ee 89 2 
25 164 2 
as 10.2 r| 


|IBM SPSS Statistics Processor is ready| [Unicode:ON | | 


Figure 3.8 Rearranged data of Example 3.15. Source: Reproduced with permission of IBM. 
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Now we proceed with 


Analyze 
Compare means 
independent samples t-test 


and receive Figure 3.9. 

In the upper row we find the result for the t-test and below that for the Welch 
test. As we mentioned above, we always use only the Welch test output. The 
decision about the rejection of the null hypothesis is as follows. If the first 
kind risk chosen in advance is larger than the value significance in the output, 
we reject the null hypothesis. In our example it must be accepted if a = 0.05 
(because it is below 0.089 in the output). 

Confidence intervals can be found in the corresponding test output right from 
the test results. 


3.4.2.1.3, Wilcoxon-Mann-Whitney Test 

Assume that we do not know whether the sample components of a two-sample 
problem are normally distributed, but the distributions are continuous, all 
moments exist and at most the expectations of the distributions are different. 
Then the pair 


Ho: fy = fy =, all higher moments equal, but arbitrary 
Hy: My # My, all higher moments equal, but arbitrary 


a 
ese cat iow ats aranetorm Aras Graphs nies Exonsions Window Eine Dieter Rech > 


COS eee Oren ey 
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TBM SPSS Statstics Processor Is reacy UUnicode:ON H:5.5.W.27.25 em 


Figure 3.9 SPSS output for comparing means. Source: Reproduced with permission of IBM. 
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of hypotheses can also be written as 


Ho :fi(yn) =fay2); 


where f(y1), fo(y2) are the densities of both distributions. If higher moments of 
the distributions are different (e.g. of 40} or skewness and excess of both dis- 
tributions are different, respectively), then the rejection of the null hypothesis 
does not say anything about the expectations. However, if the equality of all Ath 
moments (k= 2) of both distributions is guaranteed, then non-parametric tests 
can be used for the hypotheses. Such tests are generally not treated in this book 
(see Bagdonavicius et al., 2011; Rasch et al., 2011c). We only want to describe a 
special representative, the Wilcoxon test, also called Mann—Whitney test (see 
Wilcoxon, 1945; Mann and Whitney, 1947). 

For i=1,..., 4; j = 1, ..., M2, we consider 


1 for 2) <Ni 
’ 0 for yi > Ni 


The equality occurs for continuous random variables with probability 0. 
In Rasch et al. (2011c) it is described how to proceed in practical cases if 
equality happens (ties). 
The test statistic is 
mM N2 
U=S°Sody. 


i=1 j=l 


If F; (y;) are the distribution functions of y, (i = 1, 2) and if 


[oe yi [oe] 
p=P(y,<y,)= | | AO»viOndndy = | Falnviloat 


then Ho : fi(v1) = fo(y2) implies 


T 1 
p= | F,(t) i(t)dt = 2 
The 1/2 random variables d;, are distributed as B(1, p), where E(d,) = p and 


var (43) = p(l-p). Mann and Whitney (1947) showed 


1 
E(U|Ho) = "2", var(U|Ho) = mmm Sm +1) AS) 
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n\n 
Further, the distribution of WU is under Hp symmetric with respect to ae 


With the notation UW’ = n,n, - U the function 


1 for U<cgjy or U' < Cg 9 
ay: ‘6 else 


is an a-test, provided that cy/2 is determined by P(U< caj2| Ho) = a/2. The 
random variable 
Ny + 1) 


rrr 
W=U+ 7) 


is equal to the sum of the ranks of the 1, random variables yj; in the vector of 
Y, 

the ranks of the composed random vector Y = @ ) representing the test 
2 


statistic of the Wilcoxon test. Therefore k,;(Y) is equivalent to the test 


1 forW< Wu a/2 or W > Woae 


0 else 


kw(Y) = 


The quantiles Wy a2 and Wo a/2 of this test can for 1; > 20 be replaced by the 
quantiles of the standard normal distribution; for smaller 1 these quantiles 
should be calculated with the help of R. 


Example 3.15 (continued) 
For the data in Figure 3.8, we now calculate the Mann-Whitney test by SPSS. 
We use 


Analyze 
Nonparametric Tests 
Independent Samples 


and use the entry Fields putting weight as Test Fields and groups as Groups. 
Then we use Run and obtain Figure 3.10. 


3.4.2.1.4 Robustness 

All statistical tests in this chapter are proved to be a-tests and to have other 
wished properties if some distributional assumptions are fulfilled. An experi- 
menter looking for a proper statistical test often does not know whether or 
not these assumptions are fulfilled, or he knows that they are not fulfilled. 
How can we help him? Certainly not by deriving some theorems about 
this topic. 
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Hypothesis Test Summary 
Null Hypothesis Sig. Decision 


Independent 


The distribution of weight isthe Samples ne Cee 


null 
hypothesis. 


Same across categories of QTOUP.wWeitney u 


Test 


Asymptotic significances are displayed. The significance level is .05. 


lExact significance is displayed for this test. 


Figure 3.10 SPSS output of the Mann-Whitney test. 


We give here an introduction to methods of empirical statistics (see 
Chapter 1) via simulations and methods, which will be used later not only in 
this chapter but also in other chapters (especially in Chapter 11). 

General problems concerning robustness are not thoroughly discussed in this 
book. We restrict ourselves to such comments, which are necessary for under- 
standing the tests presented above (and later). The robustness of a statistical 
method means that the essential properties of this method are relatively insen- 
sitive to variations of the assumptions. We especially want to investigate the 
robustness of the methods in Section 3.4.2.1 with respect to violating normality 
or variance equality. Problems of robustness are discussed in detail in a paper of 
Rasch and Guiard (2004). 


Definition 3.8 Let k, be an a-test (0 < a < 1) for the pair {Ho, H4} of hypoth- 
eses in the class G of distributions of the random sample Y with size n. And let 
G, be a class of distributions containing G and at least one distribution, which 
does not fulfil all assumptions for guaranteeing k, to be an a-test. 

Finally, let a(g) be the risk of the first kind for k, concerning the element g of 
Gy» (estimated by simulation). Here and in the sequel, we write a(g) = Qc, the 
actual a and the a fixed the nominal a written as Qyom. 

Then k, is said to be (1 — e€)-robust in the class G, if 

max |Qgct — Anom| S €. 
g eG) 
We call a statistical test acceptable if 100 (1 — €) % = 80%. 

For example, elements of the set difference Gz \ G, are distributions with 
0; #0; for the two-sample t-test and the Wilcoxon test as well as distributions 
being not normal for the t-test and the Welch test. Rasch and Guiard (2004) 
report about extensive simulation experiments investigating the robustness 
of the t-test in a set of 87 distributions of the Fleishman system (Fleishman, 
1978) as well as the robustness of the two-sample t-test and the Wilcoxon test 
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for unequal variances. The results showed that both the one-sample f-test and 
the two-sample t-test (and also the corresponding confidence intervals given in 
Section 3.5) are extremely robust with respect to deviations from the normal 
distribution. So, Rasch et al. (2011a) conclude that the two-sample t-test 
cannot be recommended, and also it makes no sense to check in a pretest 
whether the variances of both random samples are equal or not. In most cases 
the Wilcoxon test yields unsatisfactory results, too. Only the Welch test works 
well. Its power is nearly that one of the two-sample ¢-test if both variances 
are equal. Moreover, for unequal variances, this test obeys the given risks 
in the sense of 80% robustness even for non-normal distributions with a 
skewness |y,| < 3. 


3.4.3. Comparison of Two Variances 


A UMPU-a-Test 
A UMPU-ca-test exists for the pair 


Ho:07 =03) fy) Hy arbitrary 

Hy4:0, #03) My My arbitrary 
of hypotheses and the random samples Yj = (y}....; Yim) > 
Y3= (yes cree where components yj are distributed as N(y;,07). The 


Y, 
random vector Y = 
Y> 


tial family with the natural parameters 


es 1/1 1 
Ny =Aa= 5) o o3 M34 


has a distribution from a four-parametric exponen- 


and the sufficient statistics 8, Tx =(T,*, T>*, T3")* given by 
2 M2 


ng ny 
* o Hoe # os 
S= 2%» T, Di + mee T> =) T> =o. 
I= i= j= 


2 
o : 

Under Hp we have = = 1, and the random variable 
1) 


aia 1Wui-Ni e 


% _.9M-1 
ee ee 


2 
F= 
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does not depend on ji; , #2 and of =o} =07. Hence, F is independent of 7*. 
Therefore Theorem 3.16 can be used. The random variable F is centrally dis- 
tributed as F(m, — 1, m2 - 1) under Hp. The function 


1 if F<F(m-1,m-1| =) 


K(Y) = or F > F(m-1,1~-1|1-5) 
0 else 


defines a UMPU-a-test, where F(m,-1,”2-1| P) is the P-quantile of the 
F-distribution with 1, - 1 anda -1 degrees of freedom. These quantiles for 
a = 0.05 can be found in Table D.5. This test is very sensitive to deviations 
from the normal distribution. Therefore the following Levene test should be 
used instead of it in the applications. 


Levene Test 

Box (1953) already mentioned the extreme non-robustness of the F-test com- 
paring two variances (introduced at the beginning of this Section 3.4.2.2). Rasch 
and Guiard (2004) report on extensive simulation experiments devoted to this 
problem. The results of Box show that non-robustness has to be expected 
already for relatively small deviations from the normal distribution. Hence, 
we generally suggest applying the test of Levene (1960), which is described now. 

For j = 1, 2 we put 


2 
Zi = (4-3;) s1= 1,...,N; 
and 


2 Nn 2 Nn 
SSbetween = S- 3 (Z;-Z aes SSwithin = Dy (zy-%;)° 


j=l i=l j=l i=l 
ne On) ee 2 pee. 
where Z j= in ai = 2p tg oy A 
The null hypothesis Ho is rejected if 


i SS between 


Fr= 
SSwithin 


(mm + n2-2) > F(1,m +1 -2| 1-5). 


3.4.4 Table for Sample Sizes 


We present in Table 3.4 an overview listing formulae to determine the sample 
sizes for testing hypotheses. 
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Table 3.4 Approximate sample sizes for testing hypotheses with given risks a, 6 and given 
minimum difference 6 (apply P = 1 - a for one-sided tests and P = 1 — a/2 for two-sided tests). 


Parameter Sample size 
u ne | [{e(n- 1;P) + t(n-1;1-f)} +)'| 
Hx — Hy paired observations ne | [{e(n- 1;P) +t(n-1;1- -6)}%]"| 


Hx — Hy independent samples, equal ne [2 [{t(-1;P) + t(n-1;1-f)}$] *) 
variances 

Hx — Hy independent samples, unequal =n, ® eamica ;P) +t(f*,1-B)}] ] 
variances 


Probability p 


3 [21-« Vp-p) +2» Vrid-Pd], 
(1 Po)’ 


2(P + 1- + 
Probabilities p, and p, Ho : p, = Po n= ea (P) | +n( Pr) A p))* 


2(1- B)/Px( Le ete -P) 


3.5 Confidence Estimation 


In applications, the user seldom contents oneself with point estimations for 
unknown parameters. On the contrary, he often tries to calculate or estimate 
the variance of the estimation. If this variance is sufficiently small, then there 
is no cause to distrust the estimated value. 


Definition 3.9 Let Y= (yj, yo, ..., Yn) be a random sample with realisations 
Ye {Y}, whose components are distributed as Pg € P = {Pg : 0 € Q}. Let S(Y) bea 
measurable mapping of the sample space onto the parameter space and K(Y) be 
a random set with realisations K(Y) in Q. Further, let P* be the probability meas- 
ure induced by S(Y). Then K(Y) is said to be a confidence region for 6 with the 
corresponding confidence coefficient (confidence level) 1 — a if 


PS(0¢ K(Y)|=P[0¢ K(Y)|>1-aforallOc Q. (3.48) 


In acondensed form, K(Y) is also said to be a (1 — a) confidence region. If Q c R' 
and K(Y) isaconnected set for all Ye {Y}, then K(Y) is called a confidence interval. 
The realisation K(Y) of a confidence region is called a realised confidence region. 

The interval estimation includes the construction of confidence intervals. It 
stands beside the point estimation. Nevertheless, we will see that there are ana- 
logies to the test theory concerning the optimality of confidence intervals that 
can be exploited to simplify many considerations. That is the cause for treating 
this subject in the chapter about tests. 
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Example 3.16 Letthe>1 components ofarandom sample Y = (y;, 2, «+ 5Jn)" 
be distributed as N(u,0°), where o” is known. We consider the measurable 


mapping S(Y)=y from {Y} onto Q = R'. The mean y follows a 
2 
N ( =) -distribution. A (1 — a) confidence region K(Y) with respect to yu 


has to satisfy P[we K(Y)] =1-a (here we write P for P®). We suppose that 
K(Y) is a connected set, that is, an interval K(Y) = |#,,,,]. This means 


P(t, <usp,)=1-a. 


Cc. 


Since ¥ is distributed as N (x “), it holds 


n 
Yn ae 


for a, + @2 =a, a, = 0, a2 20. Consequently, we have 


Misa} = 1l-ay,-a =l-a. 
oO 


Piy 2 <yusy 2 1 
- 21-4 ——=Zy, p =l-a 
y Vn l-a, =H y Vn 


oO 
Jn 
nitely many confidence intervals according to the choice of a, and aj =a - ay. 
If a, =0 and a2 =0, respectively, then the confidence intervals are one-sided 
(ie. only one interval end is random). The more the values a, and q» differ from 


so that #1, =y Z1-a andl, =y - ue are fulfilled. For 1 — a there are infi- 
7 


each other, the larger is the expected width E(f, -j1, ) = vice Za,). For 
example, the width becomes infinite for a, = 0 or for az = 0. Finite confidence 
intervals result for a, >0, a >0. 

Now we set conditions helping to select suitable confidence intervals from 
the huge number of possible ones. First K(Y) ought to be connected and 
finite with probability 1. Additionally, for fixed a we prefer confidence intervals 
possessing the smallest width or the smallest expected width with respect to 
all 0 € Q. 


3.5.1 One-Sided Confidence Intervals in One-Parametric 

Distribution Families 

Definition 3.10 Let the components of a random sample Y= (y1, ya, «.- Yn) 
be distributed as Py € P = {Po, 0 € Q, where Q = {6,, 62} and the improper 
values —oo for 0; and +00 for 02 are admitted. Then 


K, =K,(Y) =[0.(Y), 42) 
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and 
Kr=Kr(Y) = (A1,4(Y)], 


respectively, are said to be one-sided confidence intervals for 9 with the 
confidence coefficient | — a if 


Po{O¢ K,}2=1-aand Pp{Oe Kr} 21 -a, (3.49) 


respectively. K; is called a left-sided and Kp a right-sided confidence interval. 
A left-sided (right-sided) confidence interval with coefficient 1 — a@ is said to 
be a uniformly most powerful confidence interval (UMP (1 —- «)-interval), if 
for each 6* < 6 [6* > 6], O* € Q the probability 


Po{Ou(¥) sO} [Po{Oo(¥) 2 4" }] 


becomes minimal under the condition (3.49). 

A two-sided confidence interval K(Y) satisfying (3.48) is called uniformly most 
powerful confidence interval (UMP (1 — «)-interval), if for each 0° 40,0" € Q 
the probability P,{6"e K(Y)} becomes minimal. 

As we will see there is a close relation between UMP-a-tests and UMP (1 — a)- 
intervals. At first we more generally state the relation between a-tests and 
confidence intervals with the coefficient 1 — a. 


Theorem 3.19 Let the components of a random sample Y= (yj, yx) ---,¥n)” 
be distributed as Py € P = {Py : 0 € Q}. For each 6,€ QC R’, let {Yo} C {Y} be 
the region of the sample space {Y}, where the null hypothesis Ho: @ = Oo is 
accepted. Let K(Y) be for each Y € {Y} the subset 


K(Y) ={0€Q:Y € {Yo}}. (3.50) 


of the parameter space Q. Then K(Y) is a (I — «)-confidence interval, if a test with 
a risk of the first kind not larger than a is defined by {Yo}. If moreover {Yo} 
defines a UMP-a-test, then K((Y) is a UMP (1-a)-interval. 


Proof: Since 6 € K(Y) iff Ye{Yo}, it follows 
Po{Oe K(Y)} =Po{Y € Yo}} 21-a. 


If K*(Y) is another (1 — «)-confidence interval for @ and if wey i 
{Y :0€ K*(Y)}, then we analogously get 


Po{OEK*(Y)} =Po{Y € {Yo }}21-a, 


that is, { Yj } defines another test with maximal risk a of the first kind. Since {Yo} 
generates a UMP-test, we obtain 


Po{O EK*(Y) | Oo} = PofOEK(Y) | Oo} 
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for all 0 4 0 € @ and therefore 
Po{OEK*(Y)}>Po{OEK(Y)} forallOF#O. 


The equivalence given in the above theorem means that a realised confidence 
interval with coefficient 1 — a contains a subset w of Q so that H, : 6 = 09 would 
be accepted for all 0, € w if Y is a realisation of Y. 

The next theorem is a consequence of Theorems 3.19 and 3.8 and its 
Corollary 3.3, respectively. 


Theorem 3.20 If P* is under the assumptions of Theorem 3.8 a family of 
continuous distributions with distribution functions F,(T), then there exists 
for each a with 0<a@<1a UMP (1-a)-interval K;(Y) according to Definition 
3.10. If the equation F(T) = Po{T(Y) < T} =1-a has a solution 0 €Q, then it 
is unique. Further 6,,(Y) = 0. 
Proof: The elements of P* are continuous distributions. Hence, to each @ there 
exists a number T, _ 4 = T)_ (9) so that P,{ T(Y)>Ti-a}=a. Taking (3.24) 
into account, Y4(99) = {T': T > T, _ ,(0,)} is the rejection region of a UMP-a-tests 
for H,: 0 = Oo against Ha: 0 = 04. Then Yo(@) = {T: T < Ti _ a(9o)} is the corre- 
sponding acceptance region. Now let K(Y) be given by (3.50). Since T)_g(@o) is 
strictly monotone in 0, (the test is unbiased), K(Y) consists of all 9 ¢ Q with 
0,(Y) <0, where 0,(Y) = ming < e{9, T(Y)} < Ti - a(A@o)}. This implies the first 
assertion in Theorem 3.20. 

It follows from Corollary 3.1 of Theorem 3.1 that F,(T) is strictly antitone in 
0 for each fixed T;, provided that 0 < F(T) < 1 holds for this T. Therefore the equa- 


tion F,(T) = 1 - a has at most one solution. Let 6 be such a solution, that is, let 
FY T) =1-a. Then T;_, (4) = T follows, and the inequalities T < T, _,(0,) and 


Ti_a (4) < Ti-a(9o) or 6 <6 are equivalent. But this means 6,,(Y) = 0. Hence, 

6,,(Y) is obtained by solving the equation T(Y) = T,_ ,() in 0. 

Example 3.17 Under the conditions of Example 3.4, we look for a UMP 
2 

(1-a)-interval for p. Now T(Y)=y is distributed as N (1 “). Therefore 


2 
T,_Q(H) is the (1-a)-quantile of a N ( =) -distribution. Considering 0 = yu 
n 


we must first solve the equation F,[T(Y)]=1l-—a. Because of Tj_q(u) = 
Haw the wished UMP _ (l-a)-interval for jm has the 
n 


form [y- ae + c) 
n 
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Example 3.18 Starting with the random sample of Examples 3.5, we 
want to construct a one-sided confidence interval with coefficient 1—a for 
o° (where py is known). Using the sufficient statistic T(Y) = ~"_,(y;-1)’, 
the region Yo for accepting Ho:07=0} is given by the inequality 
T(Y) <o%CS(n|1-a@). Here CS(n|1-a) is the (1-a@)-quantile of the chi- 
squared distribution. The quantiles are shown in Table D.4. Now K(Y) written 
as o?(Y) is determined by 

o(Y)= min {o°, T(Y) <opCS(n|1-a)}. 


u 


Hence, [o7(Y), + co) with the left end 


1 0;-#)” 


ou(¥) = CS(n|1—a) 


is for each a (0 < a < 1) a UMP (1-a)- interval for o”. 

Analogously the reader can as an exercise transform other UMP-a-tests into 
corresponding UMP (1-a)-intervals. 

If under the assumptions of Theorem 3.8 the distribution of T(Y) is discrete, 
then the tests are randomised. Thus the corresponding confidence intervals 
are also called randomised. But in general we do not want to deal with such 
confidence intervals. However, in practical applications, they often are 
needed concerning the parameter p of the binomial distribution representing 
a probability p. Here we refer to Fleiss et al. (2003) and to the case of two-sided 
intervals in Section 3.5.2. 


3.5.2 Two-Sided Confidence Intervals in One-Parametric and 
Confidence Intervals in Multi-Parametric Distribution Families 


Definition 3.11 A two-sided confidence interval K(Y) with coefficient 1—a is 
said to be a uniformly most powerful interval, if K(Y) is in the class 


K,={K(Y),Po[0e K(Y)|21-a for all 0€ Q} (3.51) 
and fulfils the condition 


Pp[O* €K(Y)] = aoe Po{Oe K*(Y)} for all & AOEQ. (3.52) 


Analogous to Section 3.5.1 for continuous distributions, we can construct two- 
sided uniformly most powerful (1-a)-intervals on the base of UMP-a-tests 
for Ho: @ = 0 against H,: 6 #0. But generally such tests do not exist for all 
a, and therefore we introduce a weaker optimality condition analogous to the 
UMPU-tests. 
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Definition 3.12 A (1 — a)-confidence interval K(Y) = [L, u] is said to be an 
unbiased U(1 — a)-interval, if it lies in K, and satisfies 


Po[@ € K(Y)] <1-a for all &* FO€Q. (3.53) 


Then we note briefly that K(Y) is a U-(l—a)-interval. K(Y) is said to be a uniformly 
most powerful unbiased (1 — a)-confidence interval, if it fulfils the conditions 
(3.51) and (3.53) as well as a condition analogous to (3.52), where the minimum 
is taken within the class K, C Ky of such K(Y) satisfying both (3.51) and (3.53). 
We denote uniformly most powerful unbiased (1 — a)-confidence intervals 
shortly as UMPU (1 — a)-intervals. 

If 0 =(A,N», ...,n,)" is a parameter vector and if a confidence interval with 
respect to the real component 4 is to be designed, then we can generalise with 
i = (No, «,1x)" the Definitions 3.9 and 3.3 by replacing the demand ‘for all 0 
“by the demand” for all A and 7”. If a UMPU-a-test exists, then it is easy to see 
that the procedure described in Section 3.5.1 can be used to construct a UMPU 
(1 - a)-interval. We want to demonstrate this by presenting some examples. 
But first we must mention the fact that UMPU (1 - a)-intervals satisfy for con- 
tinuous distributions the condition 


P,[0 ¢ K(Y)] =1-a. 
Example 3.19 Under the conditions of Example 3.9, we want to construct a 
UMPU (1 - a)-interval for o”. For this purpose we use the sufficient statistic 


T(Y) =5>/_,y7 and introduce 
1 
{Yo} =A(o*) — {o Cla <= = (Y} cen, 
oO 
where Cj, and Co, fulfil (3.33) and (3.34). Observing 
2 
A(o’) = o, Ea ° ga 
C2a T(Y) Cla 
and passing to random variables shows that 


ld i< 
— So 97 — S97? 
2a Cla 1 


i=1 i= 


K(Y)= 


is a two-sided UMPU (1 — a)-interval for 0”. 


Example 3.20 On the basis of Example 3.11, a UMPU (1 — a)-interval is to 
be constructed for the expectation « of a normal distribution with unknown 
variance. Because of (3.36) we get 


{Yo} =aW)=4 t(n-11 5) sh vnse(n-1]1-), 
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and therefore 


K()= [y-e(n—ajt-$) Sosy e(mapi-5) Z|, 
is a UMPU (1 - a)-interval for yp. 


Example 3.21 On the base of Example 3.15, a UMPU (1 — a)-interval for the 
difference 1; — fz is to be constructed. It follows from (3.46) and the form of 
the UMPU-a-test K(Y) given afterwards, if in the numerator of (3.46) the 
expression j/; — lz is inserted (which is 0 under H,), that 


- = Qa Ny +ng _ = 
K(Y)=|y,- -¢( 21 ) 
(Y) P, Jn—t{ m1 + M2 a) 4) any 2282 
Qa Ny +N 
+0(m+m 21 sy] | 
2 n\n 


is a UMPU (1 — a)-interval. In this case we also propose to use instead confi- 
dence intervals that are based on the Welch test. 

If the distribution modelling the character is discrete as, for example, in 
the case of the binomial distribution, then exact tests are for all a always ran- 
domised tests. If the demand is slightly weakened by looking for a confidence 
interval that covers the parameter p at least with probability 1—-a, then an exact 
interval K(Y) = [,u] can be constructed according to Clopper and Pearson 
(1934) as follows. If [/, u] is a realised confidence interval and y the observed 
value of the random variable y distributed as B(u, p), then the endpoints / 
and u can be determined so that 


S (7 )ea-pr =a 


i=y 


and 


hold, where a + @2 =a for given a, and a is independent of y. 
We put /= 0 and u=1-(5)" for y = 0 as well as /= (5) and u=1 for y =n. 


The other values can be calculated according to Stevens (1950) with the help 
of the probability function Pyeq of the beta distribution with the parameters x 
and n-x-1, for instance, with R using the commands 


l< -qbeta(alfa/2,X,n-X+1) 
and 
us -qbeta(l-alfa/2,X+1,n-X), 
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respectively. The Clopper—Pearson intervals can be calculated with the R command 
binom.test. In SPSS confidence intervals cannot be found in the menu bar. 

The minimal covering probability is for 1 = 10 and for all p at least 
1 - (a, - @) — 0.005, but in the most cases larger than 1—a, that is, conservative. 
This was shown by Pires and Amado (2008). Both authors compared 20 con- 
struction methods for two-sided confidence intervals regarding the covering 
probability and the expected interval width using extensive simulation experi- 
ments. The study found that a method of Agresti and Coull (1998) had slight 
advantages in comparison with the Clopper—Pearson intervals. But we do not 
want to go into the matter here. 

The needed sample size can be obtained in R with the command size.prop. 
confint by calculating confidence intervals via normal approximation (see 
Rasch et al., 2011a, p. 31). 


3.5.3. Table for Sample Sizes 


We present in Table 3.5 a list of formulae for determining suitable sample sizes 
of confidence estimations. It should be observed that for location parameters, 
either the width or, if it is random, the expected width of the interval has to 
be given before lying under a reasonable bound 26. 


Table 3.5 Sample size for the construction of two-sided (1 — a)-confidence intervals with half 
expected width 6. 


Parameter Sample size 
z op 
2 ed ay >) ee 
M n= |t (n-41-5) a ro 
I? (> (n-1) 
P With R via size.prop.confint 
[ 2(7). 
P ; 2 a ar (5) on 
Hx — Hy paired observations n=|t (n -1;1- ) a zi 


Hx — Hy independent samples, equal variances n= | 20 


Ox (Ox +0. a 
Hx — Hy independent samples, unequal Nx = [aera e G *S1- ] ; 
variances 
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3.6 Sequential Tests 


3.6.1 Introduction 


Until now a sample of fixed size 1 was given. The task of statistical design of 
experiments is to determine x so that the test satisfies certain precision require- 
ments of the user. We have demonstrated this procedure in the previous 
sections. 

For testing the null hypothesis that the expectation of a normal distribution 
with unknown variance takes a particular value against a one-sided alternative 
hypothesis, the sample size has to be determined after fixing the risks a, 6 and 
the minimum difference 6 as 


2 
n= lee 1]1-a e(n-I]L- PPS] (3.54) 
according to Section 3.4.1. Apart from the fact that (3.54) can only be 
iteratively solved, it needs also a priori information about o*. Therefore 
Stein (1945) proposed a method of realising a two-stage experiment. In 
the first stage a sample of size 1p > 1 is drawn to estimate o by the var- 
iance s} of this sample and to calculate the sample size 1 of the method using 
(3.54). In the second stage n — n,, further measurements are taken. Following 
the original method of Stein in the second stage, at least one further measure- 
ment is necessary from a theoretical point of view. In this subsection we 
simplify this method by introducing the condition that no further measure- 
ments are to be taken for 1-9 <0. Nevertheless, this supplies an a-test of 
acceptable power. 

Since both parts of the experiment are carried out one after the other, 
such experiments are called sequential. Sometimes it is even tenable to 
make all measurements step by step, where each measurement is followed 
by calculating a new test statistic. A sequential testing of this kind can be 
used, if the observations of a random variable in an experiment take place 
successive in time. Typical examples are series of single experiments in a 
laboratory, psychological diagnostics in single sessions and medical treat- 
ments of patients in hospitals, consultations of clients of certain institu- 
tions and certain procedures of statistical quality control, where the 
sequential approach was used the first time (Dodge and Romig, 1929). 
The basic idea is to utilise the observations already made before the next 
are at hand. 

For example, testing the hypothesis Ho: v = Wo against Ha: > Mo there are 
three possibilities in each step of evaluation, namely, 


1) Accept Hp. 
2) Reject Hp. 
3) Continue the investigation. 
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Comparing sequential tests with tests of fixed size, the advantage of the former 
is that on the average fewer experimental units are needed considering great 
series of investigations. But a decision between the abovementioned three cases 
is only possible for a priori given values of a, / and 6. Unfortunately, this a priori 
information is not compelling for testing with fixed size. 

Nevertheless, we will only briefly deal with the theory of sequential tests for 
two good reasons. Firstly the up-to-now unsurpassed textbook of Wald (1947) 
has since been reprinted and is therefore generally available (Wald, 1947/2004), 
and new results can be found in books of Ghosh and Sen (1991) as well as DeG- 
root (2005). Secondly we do not recommend the application of this general the- 
ory, but we recommend closed plans, which end after finite steps with certainty 
(and not only with probability 1). 

We start with some concepts. 


Definition 3.13 Let a sequence S = {yj, yo, ...} of random variables (a stochas- 
tic process) be given, where the components are identically and stochastically 
independently distributed as Pg ¢ P ={Po € Q}. Let the parameter space Q consist 
of two different elements, 0) and 04. Besides, let y;€ {Y} C RY Concerning testing 
of the hypotheses Ho: 0 = 00; Ha: 0 = 04, we suppose that for each v in the above 
sequence a decomposition {Mj,M‘,M?,} of 


{Y"} = {yn} x {92} x x {Inf CR” 
with 
Me UM" UMp = Mi! x {yn} CR" 
exists. Then the sets Mj,M’,M#? (n = 1, 2, ...) define together with the 
prescription 
Mj acceptation of Hp : 6 = 9 
(Vis In) € § Mi rejection of Hy :0 = 
M? continuation, observe y,, 41 


a sequential test with respect to the pair Ho: 0 = 9; Hy: 8 = 04. Mj and Mj, are 
called final decisions. The pair (a, /) of risks is called the strength of a sequen- 
tial test. 


Definition 3.14 Let a sequence S = {yj, yo, ...} of random variables be given 
where the components are identically and stochastically independently distrib- 
uted as Py € P ={Pg € Q}. Let the parameter space 2 consist of the two different 
elements 09 and 04. A sequential test for Ho: 0 = Oo against H,: 0 = 04, based on 
the ratio 


L(Y |04) 


LR, = ——_ —. 
L(Y” |6o) 
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of the likelihood functions L(Y” |6) of both parameter values and on the first 1 
elements Y={y;, yo, «.., Yn} of the sequence S is said to be sequential likelihood 
ratio test (SLRT), if for certain numbers A and B with 0 < B<1<A the decom- 
position of {Y”} reads 


M? = {y :LR, <B}, M" = {yr LR, 2A}, Mi? = 1 :B<LR, <A}. 


Theorem 3.21 A sequential likelihood ratio test (SLRT) that leads with prob- 
ability 1 to a final decision with the strength (a, /) fulfils with the numbers A and 
B from Definition 3.14 the conditions 


Ae (3.55) 


Be . 
l-a 


(3.56) 


In the applications the equalities are often used in (3.56) and (3.57) to calculate 
approximately the bounds A and B. Such tests are called approximate tests. 

It follows from the theory that SLRT can hardly be recommended, since they 
end under certain assumptions only with probability 1. So far they are the most 
powerful tests for a given strength as the expectation of the sample size — the 
average sample number (ASN) - for such tests is minimal and smaller than 
the size for tests where the size is fixed. Since it is unknown for which maximal 
sample size the SLRT ends with certainty, it belongs to the class of open 
sequential tests. In comparison there are also closed sequential tests, that is, 
tests with a secure maximal sample size, but this advantage is won by a bit 
larger ASN. 


3.6.2 Wald’s Sequential Likelihood Ratio Test for One-Parametric 
Exponential Families 


All results are presented without proofs. The interested reader can find proofs in 
the book of Wald (1947). Some results come from an unpublished manuscript 
of B. Schneider (1992). 

We thank him for the permission to use the results for our book. 

Let a sequence S = {y, 2, ...} of identically and independently distributed ran- 
dom variables be given, which are distributed as y with the same likelihood func- 
tion f(y, 0). We test the null hypothesis 


Ho : 0 = Oolf (y,8) =f (y; Ao) 


against the alternative 


Ha :0= 01 [f(9,0) =f (9, 41)] 
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where 05 £0; and 0),0, €QCR'. 
The realised likelihood ratio after 1 observations is 


301) 
LR, = 1 1) 


3.57 
S (v5 0)’ ( ) 


The subsequent questions arise: 


e How do we choose the numbers A and B in (3.55) and (3.56)? 
e What is the mean size E(u | A) of the sequence {yj, yo, ...}? 


Wald used the following approximations for A, B. If the nominal risks of the first 
and the second kind are given by Qyoym, and Byron then the real (actual) risks age; 
and fac, satisfy 


1 
Qact = A = Anom; Bact <B = Brom: 


Hence, the approximate test introduced in the preceding subsection is conserv- 
ative. This supplies the relations in (3.55) and (3.56). The corresponding bounds 
are called Wald bounds. 


Example 3.22 Assuming that the nominal risks of the first and the second 
kind are 0.05 and 0.1, respectively, the relations (3.55) and (3.56) lead (in the 
equality case) to the values A = 18 and B = 0.10536. Therefore we have to con- 
tinue the process up to the step where 0.10356 < LR,, < 18 is fulfilled. In a system 
of coordinates with n on the abscissa and LR,, on the ordinate, the zone of con- 
tinuation lies between two parallel lines. 

The (approximate) power function of the SLRT is 


alae, 
1(0) = ( 2 for h(0) £0. (3.58) 


1-p\'® 7 gp \' 
(a) ee 


The function /(6) in (3.58) is uniquely defined in the continuous case by the 
equation 


Geen) . f (y,0)dy =1 


and in the discrete case i the equation 


Jis 01) A 
Sem £0 2) ; 
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Wald showed that for sequential likelihood ratio tests, the expected (average) 
sample size ASN is minimal under all sequential tests with risks not larger than 
Qnom and Brom provided that one of the two hypotheses is true. 

With the notations 


L(Yp 01) (3.59) 


we get InLR,, = >> z;. For E(|z|) < oo Wald showed also 


x(0)InA + [1-2(0)]InB 


PAs E(a\0) 


, if E(z|0) £0. (3.60) 


The experiment ends if in the current step at least one inequality (sign) becomes 
an equality (sign) in 


Qact $< = =Onom, Pact < B= Buom- 


A 


Wijsman (1991) presented an approximation for E(m|@) that reads in the 
special case 0 = 09 


1 A-1 1-B 
E(n|60)* 


InB 
(z/@) |A-B A-B 
and in the general case 
1 _1)- (1- 
(A 1)-By pi 4 (1-B) 
E(z|0)| A-B A-B 


ind] (3.61) 


E(n|0) = 


in] (3.62) 


In an exponential family the first derivative of A(@) supplies the expectation of y 
and the second derivative the variance of y. 

Wald (1947) proved in the continuous case that there exists a 0* with 
E(z|6") =0 that fulfils 4(6*) = 0 in (3.58) and moreover that 


_ | InA]-|InB| 


ISS Egaleny, 


, if h(6") =0 (3.63) 
holds. 


Example 3.23 We consider a one-parametric exponential family with density 
function fly, 0) = h(y)e"~ 4. 
We want to test 
Ho :0=9 (n=N0) 
against 
H4:0=0; (n=1), 9 <1 (19 <m) 
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with y;=7(0,) ;i=0,1. For 09 > 0; we interchange the hypotheses. 
The variables z; can be written (in a realised form) as z;= (1 - 1o0)y;- 
[A(n,) -A(yo)]. We continue while 


InB<(m —1o) > yi-n[A(m) -A(N)|< InA 


and because of 7; — 49 > 0 while 


InB + n[A(n,) -A(10)] <S-yi< Ind + n[A(m) -A(no)] =p" 


bu (m1 -No) (m1 -No) 


u 


(3.64) 


is satisfied. 
For 7, — No < 0 we continue if 


bi>S yi> bp 
holds with the bounds b” and b” from (3.64). W.1.0.g. we restrict ourselves to the 
case 471 — o> 0. 

In the discrete case the distribution function of the random process is 
between parallel lines a step function. Therefore it cannot be guaranteed that 
the Wald bound is met in the last step exactly. In such cases an algorithm of 
Young (1994) is useful, which is described now. 

Suppose that the test ends with the nth observation. The probability for 
obtaining a value t,, of the variable t,,= )~ y; after 1 units were observed is 
the sum of the probability sequence that fulfils the conditions 
bi <t;<bi;i=1,2,...,n-1 and t,,=t,. We write this probability as 


pr pr 
P(t, =) = ye P(t = tn|tn-1 =f) P(ti-1 =/) = » Ff (tn-7:0) -P(th-1 =/). 
jab? aby 


We start with P(tp = 0) = 1 and determine all further probabilities by recursion. 
For fixed the probability for accepting H, at the mth observation is 
pr) re 


PGs] SS OS Fike Peas) 


ja=br-l k=bn-j4+1 


(3.65) 
pet 
= S> [1-F(62-7:0)] -P(tr-1=/), 
jab 
where F is the distribution function. 
For fixed the probability for accepting Ho at the mth observation is 
be 1 bn -j- ak 
P(t, <b!) =k= p> So f(ks0)-P (tn-1 =/) 
=bi-1 k=0 
(3.66) 


1 
je 


= S> [F(b}-j-1:0)] -P(tr-1=/). 


j=oy 
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The power function is given by }>j_ P(t; < b/,) ifthe procedure ends with step n, 
and the probability for this event is equal to P(t, <b”) + P(t, >b”). 


In the following example we use one-sided hypotheses with a = 0.05, 6 = 0.1 
and 6 = 0, - 09= 0.1. 


Example 3.24 Normal Distribution with Known Variance 
If y is distributed as N(u; 0°) with known o”, then we get 


1 
z= Ins [29 (Hi -Ho) + HoH)» 


1 
E(z|z) = In [2¢(1 - Ho) + #o- Hi) 


We test 
Ao : f= Ho 
against the alternative 
Ha: f= "ys Uo Flys WER’. 
For fo — Hy, = 6 and @ = yp, we obtain in (3.58) with h(@) = h(y) 
“ane Mw) 1 o-h0-0 ye 
eC 202 : eC 20 dy= 1. 
came 


oV2n 


Finally an R-routine in OPDOE supplies E(m|) and z(u) as functions of p. 


3.6.3 Test about Mean Values for Unknown Variances 


Now we deal with a two-parametric exponential family. We have to adapt the 
method from Section 3.6.1 to this case. We have a nuisance parameter, that is, 
the method cannot be used directly. The parameter vector of an exponential 
family is 0 = (0;, 02)". For gp £9, we have to test 


Ho: 9(0) < Gop ER! against H4: p(0)>Q);9 ER! 
or 
Ho: 9(0)=@o3 VER" against Ha: 9(0) FQ 9ER'. 


In this book, we consider the one-dimensional normal distribution; the corre- 
sponding test is called sequential t-test. 


The Sequential t-Test 

The normal distribution of a random variable y is a two-parametric exponential 
family with the parameter vector 0 = (u;0°)” and the log-likelihood function 
(natural logarithm of the likelihood function) 


gor |= InV2z- Ino 5) ae 
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We put ~(@) = £ and test 
Ho e <M against H, = 2 
oO oO 
or 
Ho # =@ against Hy Ed Ge. 
o o 


If we replace the noisy parameter, as in the case of fixed sample size in 
Section 3.6.2, by its estimation, then LR,, in Definition 3.14 is no likelihood ratio. 
We consider a sequence 


Zy = 2 (Y1)3 Zo = Za(Wy3Io)o- 


so that for each n > 1 the conditional likelihood function f (m) (21, 22505 ZnsQ) of 
(Z1, Za, -.-» Zn) depends only via g(9) on 0. 
Then we apply the theory of Section 3.6.1 with 


IRS [pe eee CPs) ast (3.67) 
i=lJu (Zis @o) 
instead of 
“pf (On) 
An = 
hin F (yO) 


The choice of the sequence Z, = Z;(y); Z2 = Za(y1; y2),... is explained in the 
following. 
Lehmann (1959) formulated the principle of invariant tests. If we multiply py 


and o with a positive real number c, then the hypotheses Ho es <@o and 
o 


Hy, - >, remain unchanged since they are invariant with respect to affine 
transformations. The random variables y; = cy, are normally distributed with 
expectation cu and standard deviation co. Therefore, the family of the distribu- 
tions of y,,y2, ... »¥, is for each n = 1 the same as that of cy, ,cyo, ... > CVn 
Summarised we see that both the hypotheses and the family of the distributions 
are invariant with respect to affine transformations. Now the sequential t-test 
can be implemented according to Eisenberg and Ghosh (1991) as follows: 


e Specialise LR; in (3.67) for a normal distribution, that is, 


3(7- 2) (g- v) ) Sov a 1e-2(t—vnq,) dt 
Jo” t?-1e-2(£-Vn@o) dt 
1-p 
a 


LR, =e? 


e Solve LR = _ and LR’ = 


(3.68) 


with respect to v,,. We denote the solutions 


by v? and vs fe eceekively: 
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e Calculate 
ae Wi 
a) 
V ini di 
and continue while vi <v, <v? holds. 
e Accept Hp if vy, < vi and reject Ho if v, >v". 


(3.69) 


Vy = 


Approximation of the Likelihood Function for Constructing an Approximate t-Test 
Now we want to use certain functions z and v related to the Taylor expansion of 
the log-likelihood function to construct simple sequential tests. 

Let the sequence (y,,72,...) of identically distributed and independent ran- 
dom variables be distributed as y with the likelihood function fly; 0). We expand 
[(y; 8) = In fly; 6) according to Taylor with respect to 0 at 6 = 0 up to the second 
order with third-order error term: 


1(y;0) =1(y;0) + O-1g(y30) + 50 loo(90) + O(6°). (3.70) 
Now we put 
dln(y,0 
z=I9(y;0) = SOE a (3.71) 
0° In(y,0 
—Vv=l¢6(y;0) = ooo (3.72) 


If we neglect the error term O(6°), we get a quadratic approximation around 6 
= 0: 


1 
I(y;0) = const. +0-2- 70. (3.73) 


In the case where the likelihood function depends also on a vector, 7 = (74, 
.., 1) of noisy parameters. Whitehead (1997) proposed to replace this vector 

by the vector of the corresponding maximum likelihood estimators. 

Then the likelihood function reads f(y; 6, 7) and has the logarithm /(y; 0, z) = 
In fly; 9, 7). 

We denote the maximum likelihood estimator of ¢ by 7(@), which supplies 
T=7(0) for 6 = 0. The maximum likelihood estimator of 7 is a solution of the 
simultaneous equations 


0 

—I(y;0,7) =0 

= 1054.7) 

and under natural assumptions also the unique solution, since /(y; 0, 7) is then 
concave (convex from above) in z and has therefore only one maximum. We 
expand 7(@) according to Taylor at 6 = 0 taking a quadratic error term: 


7(0) =T+ 0-2 7(6)\p-0+ O(@). (3.74) 
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in i 
The vector 7g = 3979) is the first derivative of 7(@) with respect to 0. The matrix 


of the second partial derivatives (called also Hessian matrix) of Inf(y; 8, 7) with 
respect to z;and 7; for c=7 is denoted by M,,(y,0, 7(0)). 

After some rearrangements (see Whitehead, 1997), we can write z and v with 
the notations 


dln(y,0,7 
1p(930,7) = omy 4)), a 
0° In(y, 6,7) 0” In(y, 6,7) 
lea (y;0,7) = ag _|o=05 loc(y30,7) = —a9.a¢ [a= oe=* 
in the following form: 
z= 1g(y;0,7), (3.75) 
v = —L99(y;0,7) -lor(y30,7)" -M;2(y,0,=(0)) -lo-(30,7). (3.76) 


Using z- and v-values in (3.71) and (3.72) in the case without noisy parameters 
or in (3.75) and (3.77) in the case with noisy parameter(s), unique approximate 
sequential likelihood ratio tests (SLRT) can be constructed. 

After observing 1 elements y,,y2,...,y, of the sequence, which are 
distributed as y with the log-likelihood function /(y; 0, 7) = In fly; 9, 7), we write 
the z-function in (3.71) and (3.75), respectively, as 


n= a= Sleeve), 
i=1 i=1 


where the estimator of the noisy parameter is put 0, if it is missing. The number 
Z, represents the efficient value and characterises the deviation from the null 
hypothesis. 

The v-function is connected with the Fisher information matrix 


-- of {iyo Sa n-i(9), 


where 
FUy,0,7) 
i(@) = -—-,— 
@) 00" 
is the information of an Pr aaeine with i(@) = E,(i()). 
Now we put v = “E[i(0)]|9 = 0. Since likelihood estimations are asymptotically 


normally distributed, i variable 


Zn = a = Dol 0.7) 
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is approximately asymptotically normally distributed with the expectation 0 v 
and the variance v. 

After observing 1 elements y,,y2, ... ,¥, distributed as y, we write the 
v-functions (3.72) and (3.76) in the form 


n= > { =Lo0(930,7) ~Loe(910,8)" -Mre(9,9,(8)) -loe(930,7) } 
i=1 
For testing the null hypothesis 
Hy :0=0; 09€2 CR against Hy:0=0, 40; A.€QCR', 


we use the approximate SLRT as follows: 


e Continue in taking observation values, while 


1 1, 1- 1 
Gps Ine” eyes a” with b = <0). 
01 l-a 2 


1, 1- 
e Accept H4:0= 0, > 0, if Z,-—DVy > do = a esas and accept H,:0 = 0, < 0, if 
1 Mocs 
Zn —OVy < Gy = Aa ine respectively; otherwise accept Ho. 
1 —a 


The power function of the test is 


1-28 
-() 
n(0) x 2 for 040.50, 


1-B 1-25 B 1-25 
a 


and 


for 06=0.50;. 


n(9) = —— = 
Hom 


The expected sample size is given by 


1-25 a 1-25 
A Hh-(4) In p ( (* *) 
a l-a l-a a 


E(n|0) = for 040.50; 


and 
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The null hypothesis Ho: 6 =0 can be used. W.Lo.g., as it is obtained if in the 
general hypothesis Ho : u = Ho 4 0, the value jo is subtracted from all observation 
values. 


Now we consider the normal distribution with unknown variance by using the 
approximation given in this section. We test 
Ho: =0;0° arbitrary against Ha: w=", AO, €QCR'. 


We put t=o0° and 6= F Then the log-likelihood function reads 
(oy 


1 1 
Lo(¥13.923-+3. Yn 9) = — at In(2z0) - 02 (yi-)” 


(3.77) 
Dylans) ee ) 
— ie O)-= SS 
tf Prem fod 
The efficient value is 
gee ee. (3.78) 
ae 
n ini 
The v-function is 
2 
Zz 
n= = ae 7 
Ya=n— >" (3.79) 


The formulae for z, and v,, are listed for some distributions in Table 3.6. 


3.6.4 Approximate Tests for the Two-Sample Problem 


We have two distributions with parameters 0), 82 and a common noisy para- 
meter y. Two random samples (y;,..., Yinj) of size n; with i = 1, 2 are sequentially 
drawn to test the null hypothesis 


Table 3.6 Values z, and v, for special distributions. 


Distribution Log-likelihood Hypotheses Zz, Vn 

; 2 ‘u= e = 
Normal,o  __ * In(2ao)- 290", [% -y] Ho: =0 oe ie V, =n 
known Haip= pa ¢ 

2 ee n 2 
Normal,o  __ * In(2x0)- 4" [% -y] Ho: =0 a daw! Vn =n 58 
unknown Ha:m=py 7 oie 
Bernoulli n In, +nIn(1-p) Ho: P = Po Zn = Pen Yn pol =p) 
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Ho 5 01 = 02 
against one of the following alternative hypotheses: 


a) Hy : 0, > 05. 
b) Hy : 01 < 05. 
c) Ha: 0; F Og, 


A suitable reparametrisation is 


1 1 
0= 3% -02);p= 3M + 02). 
The log-likelihood function of both samples reads 


(4595, Gow) = 129 (943 1,9) +1 (3795 O2,W). 


For simplification we omit now the arguments of the functions. Then the first 
and second partial derivatives of / are 


Pe hae ba 
phe a 
ly = 


191) + 192 


= 


b — 19) 12) 


P ~ "0,0; "020% 
loy = lor Dorr 
i — 2”) 4 12”) 


PY ~ “Ow Ooy 
bn = Ny) + Epp 
The expectations @; are solutions of 


19) (9G) + 19? @) = 0, 


Ws Z) 4 10(5.g) 
Ly) (QW) + LP’ (GV) = 0. 


Now we can continue as described in Section 3.6.2. We do not want to go into 
detail here, since we are more interested in focusing on triangular tests (see the 
following Section 3.6.5). 

Naturally, it is wrong to prefer sequential tests in each case to non-sequential 
ones. At most, it holds for the mean sample size. Namely, in a sequential test the 
actual v in a final decision could be larger than the one for a test with a priori 
given size. Besides, for these kinds of experiments, the necessary time interval 
has to be taken into account. A sequential experiment lasts at least 1-times as 
long as it has a fixed size n. Sequential evaluations are beneficial (compared with 
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other methods) in such cases where the data arise anyway sequentially (e.g. in 
medical tests or treatments for patients who are rarely ill). 


3.6.5 Sequential Triangular Tests 


In this section we turn to special closed tests, the triangular tests. 

The values of the decision statistics of the triangular tests correspond to those 
of the approximate tests in the previous section. In a suitable coordinate system, 
the sequence of these values (ordinates) generates as a function of successive 
points in time and sample sizes (abscissas), respectively, a sequential path. 
The population is here the set of all possible paths. Within the plane coordinate 
system, a triangular ‘zone of continuation’ is defined, which contains the origin 
of the time axis. While the path runs in this zone, the process of drawing sam- 
ples is continued. If the path meets or crosses the boundary of the zone, then the 
data collection is finished. The decision whether the null hypothesis is accepted 
or rejected depends on the point of boundary crossing. The separation of the 
boundary into two parts has to fulfil the condition that for true Ho the part 
of rejecting Hp is reached at most with probability a and for true H, the part 
of accepting Hp is reached at most with probability 7. We want to point out that 
it is not necessary to make an evaluation after each newly taken sample value 
and to wait for the evaluation result before the sampling is continued. The 
sequence path is independent of the decision whether an evaluation is made 
or not. Hence, it is definitely possible to restrict the evaluations to certain time 
points or sample sizes fixed before or ad hoc. 

Regarding sequential triangular tests for a parameter 0, we want to use the 
standardised null hypothesis 9 = 0. Otherwise a reparametrisation of 6 has to 
be chosen so that 6 becomes for the corresponding reference value 9p of the null 
hypothesis the value 0. 

The two variables z and v create the sequence path. They are derived from 
the likelihood function L(@) as described in Section 3.6.3. More precisely, 
z and v are introduced by using derivatives of /(9) = In L(9) with respect to 
0. Namely, z is the first derivative of /(0), and v is the negative second deriv- 
ative of /(0) with respect to 0 in place 0. If we replace the sample values in the 
likelihood function by corresponding random variables (whose realisations 
represent the sample), then z becomes a random variable itself, which is 
for not too small sample sizes and not too great absolute values of @ nearly 
normally distributed with the expectation 6 and the variance v. Therefore 
z can be considered as a measure for the deviation of the parameters 0 from 
the value 0 of the null hypothesis. The variable v characterises the amount of 
information in the sample with respect to the parameters 6. This amount 
increases with increasing sample size, that is, v is a monotone increasing 
function of the sample size. 
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For triangular tests the continuation zone is a closed triangular set. These 
tests are based on the asymptotic tests of the previous section. 
We consider the one-sample problem and test 


Ho:0=@ against Hy:0=6;. 


The continuation zone is given by 


-a+3bv,<Z,<at+bv, for 0,>9, 
-atbv,<Z,<a+3bv, for 0, <4 


where the sequence (z,,; v,,) is defined in (3.75) and (3.76). 
The hypothesis Ho : 8 = Oo is accepted, if 


Znzat+bv, for 0,:>0 
and if 
Zn<—-a+bv, for 0, <0. 


If z,, leaves the continuation zone or meets its boundary, then H,:0=0, is 
accepted. 
The constants a and b are determined by 


a 
Z1 B Ine 

=(14+—|—4, 3.80 

7 ( +22) A, oe) 


2 (=) (3.81) 


Both straight lines on the boundary meet in the point 


(Vax3 Zmax ) = oe 2a) : 

If this point is reached, we accept H,:0=06,. The point of inter- 
section corresponds to the maximal sample size. This size is larger than that 
of experiments with fixed size for equally prescribed precision, but the latter 
is larger than the average sample size (ASN) of the triangular test. 

Now some special cases follow, which can be solved using the software 
OPDOE in R. 

First we consider the problem of Example 3.11. Let S = (91, y2,...) be a 
sequence with components distributed together with y as N(u, 0”). We want 
to test 


Ho : f= Mo, 6” arbitrary, against Hy: = 1, o° arbitrary. 
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Then we get 
wie Wi 7 Zi 


= 7 Vy, =n a 
| oie Ii y 
n 


The boundary lines of the triangle follow from (3.80) and (3.81) put- 
6 _fi7Fo 
= 

(oy 


Zn = 


ting 
Regarding the two-sample problem, we test analogously to Section 3.4.2.1 


Ho: fy =n =", 07 = 05 =0° arbitrary 


against 
‘ Vie ee ee 

Ha: fy Ay, Of = 07 =o" arbitrary. 

TH 


We put 0; = M7 Pa and calculate from the 1; and ng, respectively, observations 
oO 


the maximum likelihood estimator 


7 = Sn - I + yo 1 (5 -Io) 


n 


nN, +n 
Then we introduce 
— 2 
NN Yi-y2 Ny\N2 Zn 
MW = ~ , n -_ iy ean to 
M+ Oy ny + Np 2(n1 + N12) 


The constants a and b result again from (3.80) and (3.82). Analogously many 
tests can be derived from this general theory. More details on the R-files and 
examples using concrete data and including the accompanying triangles are 
presented in Rasch at al. (2011b). We want to clarify only one special case, since 
it stands out from the usual frame. This case was just recently investigated by 
Schneider et al. (2014). 


3.6.6 A Sequential Triangular Test for the Correlation Coefficient 


We suppose that the distribution F(x,y) of a two-dimensional continuous ran- 
dom vector («, y) has finite second moments 07, 0; and o,. Then the correlation 
coefficient p = 0,y/(6,0y) of the distribution exists and can be calculated. We 
want to test the null hypothesis Ho: p < fo (or p = fo) against the alternative 
Ha: p > Po (or p < po). The probability for rejecting Hp although p < po (or 
P = Po) is to be less or equal to a, and the probability for rejecting H, although 
P = P1 > Po (Or p = Pi < Po) is to be less or equal to f. 

The empirical correlation coefficient r = s,/(s,5,), which is determined from 
k data pairs (x; y;) with i = 1,...,k as realisations from (x, y), is an estimate for the 
parameter p (where s,,, ee s; are the empirical covariance and the empirical 
variances, respectively). Naturally r can be used as test statistic for p. Fisher 
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(1915) derived the distribution of r assuming a two-dimensional (bivariate) 
normal distribution. He showed that this distribution only depends on v and 
p. Later Fisher (1921) introduced the transformed variable 


pone (=) (3.82) 


2 1l-r 


as a test statistic. Moreover, he proved that for a small k the distribution of 
this statistic can already be quite well approximated by a normal distribution. 
Following Cramér (1946) it suffices already k = 10 to get for the interval 
—0.8 < p < +0.8 a very good adaptation to a normal distribution. We propose 
here for technical causes a further transformation, namely, u = 2z. This statistic 
has approximately the following expectation and variance: 


E(u) =¢(p) = In—— + ——,; var(u) = k3 (3.83) 


If we look for a usable triangular test for hypotheses about the correlation coef- 
ficient p, then the sequence of data pairs (x;, y;) is unsuited, since their likelihood 
function depends not only on p but also on the expectations and variances of the 
two variables x and y (altogether five parameter), which cannot be estimated by 
one data pair alone. We need at least three data pairs. This suggests the idea with 
the sequence of the data pairs (x; y;) to generate at first successive partial sam- 
ples of arbitrarily chosen size k and to calculate with the data of each partial 
sample j a test statistic possessing a known distribution, which depends on 
the parameter p. A hot candidate for this is the already introduced z-statistic 
of Fisher (used here with u = 2z instead of z). As mentioned above this statistic 
is for not too small sample sizes k approximately normally distributed with the 
expectation ¢(p) and the variance 4/(k—3) (see (3.84)). As we supposed for the 
triangular test to use for the null hypothesis the standardised parameter value 0, 
we transform for testing the hypothesis p = pp the u-values into u*-values so that 
they have for p = po the expectation 0 and the variance 1. Hence, our triangular 
test will use the sequence 


1+p0_ Po k-3 
tela 
4 (« "T-p) k-1)V 4 


for j = 1, 2, .... The expectation of u; is the tested parameter 0: 


1 1 —~po\ Vk- 
9=E(u;) = sl Ore 2) a (3.84) 
l-p 1-p) k-1 2 


For p = fo we get the wanted standard 0 = 0. The value for p = p, is denoted by 6. 

The numbers u; that are calculated from the consecutively drawn partial 
samples j with the empirical correlation coefficients r; (implicitly contained 
in u;) are realisations of independent (approximately) normally distributed 
random variables with the expectation @ and the variance 1. If m consecutive 
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values uj are available, then the log-likelihood function involving these m values 


reads as 


1(@) = const. yey 0). (3.85) 


Now we write again z for u to be in accordance with the usual notation for tri- 
angular tests. This rewriting leads to the test statistics 


_d(é) SS, — AO) | 
Sag a Vm = —— ag = (3.86) 


Using a (z, v)-coordinate system, the sequence path is generated by the points 
(ZVm) obtained by the evaluation steps m = 1, 2, 3, .... 

The continuation zone is a triangle whose sides are determined by two 
variables a and c depending on the risks a, f, the sample size k and the value 
of the alternative hypothesis 0: 


Ed 1 
(1 + 28) In (=) 0 
aa S/ c= ' (3.87) 


a= 3C - 
(1+ 2 


A; 
Z1-a 


Here zp denotes the P-quantile of the standardised normal distribution. One 
side of the triangle lies on the z-axis extending from a to —a. Both sides are cre- 
ated by the straight lines 


G,:z=a+cvandG):z= —a+3cy, (3.88) 


which meet in the point with the coordinates 
a 
Vmax = s Zmax = 2a. (3.89) 


For 0 = 0; > 0 we have a > 0 andc > 0. The upper side of the triangular starting 
from a on the z-axis has the ascent c, while the lower side starting from —a 
on the z-axis has the ascent 3c with respect to the v-axis. Moreover, for 0, < 0 
it is a < 0 andc < 0. Now the upper side starting from -a has the ascent 3c, 
and the lower side starting from a has the ascent c with respect to the v-axis. 

The decision rule is as follows: Continue making observations up to the step 
where Z,, reaches the value a + cv,, or goes under it, and accept Hp, if 


-A430Vjq <Zyq<A+CVm for 0, >0, or (3.90) 
-A+3CVy>Zy>A+CVy, for 0, <0. 


In the case 6, > 0 the alternative H, has to be accepted, if z,,, reaches at v,, 
the straight line z = a + cv,, or goes over it, and Hg, if z,,, reaches at v,,, the value 
Z= 4 + 3cV,, or goes under it. In the case 0, < 0 the alternative H, has to be 
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accepted, if z,,, reaches the value z = a + 3cv,, or goes over it. If the top of the 
triangular is hit exactly, then Hy, has to be accepted. 


Example 3.25 We want to test the null hypothesis p < 0.6 against the 

alternative hypothesis p > 0.6 with prescribing the risks a = 0.05, f = 0.2 and 

the minimum deviation p, - po = 0.1. Further we choose k = 12. This means 

we calculate one correlation coefficient from samples of each 12 elements. 
Then we get for p, = 0.7 and for /o = 0.6 the values 


1+0.7\ 0.7 
<(0.7)= In( i )+9p-1.798 


1-0.7 11 
and 
1+0.6 0.6 
¢(0.6) = ins) rch 1.441. 


Because of /k-3=3 the Formula (3.84) supplies 
3 
01 = 5 (1.798- 1.4444) = 0.5355. 


Taking Zo,g = 0.8416 and Zo95 = 1.6449 into account, the sides of the triangular 
result by (3.87) from 


0.8416\ /1 
(: "7 as) z (5 i) 
= “L = 6.50 


0.5355 


and 


c= 2. 0ind8o = =0.1771 


0.8416 
2( 1+ 
( aaa 
(see Figure 3.11). Further, (3.89) supplies 
6.50 
Vmax = 2S S7 
0.1771 


The number 7, of observations can be calculated for a test with fixed sample 
size under corresponding precision requirements using the software R or the 
iterative procedure 


= 36.7, Zmax = 13. 


2 


Z1-atZ1-p 


in(; A) in(; “e2) gic BO 
1-p, l-po/  nj-1-1 


where the result at the end of the iteration is denoted by gx. 
Schneider et al. (2014) investigated the approximation quality of such tests as 
well as Rasch and Yanagida (2015). 


nj = 34,4 
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15 4H1 Figure 3.11 Graph of the triangle 
obtained in Example 3.25. 


Simulations were carried out with 10 000 repetitions (calculations of the test 
statistics) using different sample sizes k, two-dimensional normally distributed 
random numbers «, y with pi, = Hy = 0, 02 = o; = 1 and a correlation coefficient 


Oxy = p, nominal risks Ayo = 0.05, Brom = 0.1 and 0.2 as well as some values of po, 
fi. Criteria for the quality of the tests were as follows: 


a) The relative frequency of rejecting Ho wrongly, if p = po. This is an estimator 
of the actual risk of the first kind, ajc. 

b) The relative frequency of rejecting H, if p = p;. This is an estimator of the 
actual risk of the second kind, Bac. 

c) The mean number of partial samples for determining r and z up to the stop 
for Po and /. 

d) The mean number of pairs (x, y), that is, the ASN, for po and p, taken over all 
10 000 repetitions. 

In a special case the sample size can be over or under the value of ASN. Some 

results are presented in Table 3.7. There are two values of k listed, for which ac; 

lies just under or just over 0.05 with exception of one case, where a4¢; = 0.05 is 

exactly met. Table 3.8 lists the k-values that obey a@,.; and fc; The ASN strongly 

depends on the value of p. This is demonstrated in the next example. 


Example 3.26 In Table 3.7 we consider the case with a,,; = 0.05, po = 0.6, 
pP1 = 0.75, a = 0.05, 6 = 0.1 and k = 20. The following values of p were simulated: 


p =0.05; 0.1; 0.15; 0.2; 0.25; 0.3; 0.35; 0.4; 0.45; 
0.5; 0.55; 0.65; 0.7; 0.8; 0.85; 0.9; 0.95. 


Using 10 000 repetitions the ASN and the relative frequency of rejecting Ho are 
plotted in Figure 3.12. The ASN is shown in Table 3.9, its graph tends for p — 0 
to 30 and for p — 1 to 20. The maximum lies between p = 0.6 and p = 0.75. 


Table 3.7 


Simulation results for « =0.05. 


Po = 0.5, py = 0.7 


Po = 0.6, p, = 0.75 


Po = 0.6, p; = 0.8 


Po = 9.7, p, = 0.8 


B=0.1 BH=02 B=01 fP=02 B=0.1 B=02 B=0.1 p=0.2 
k 12 16 12 16 20 12 16 12 16 12 16 20 50 16 20 
act 0.060 0.049 0.053 0.042 0.050 0.063 0.052 0.052 0.041 0.048 0.038 0.064 0.036 0.064 0.057 
Bact 0.043 0.053 0.114 0.130 0.049 0.103 0.112 0.040 0.047 0.109 0.117 0.044 0.053 0.102 0.112 
ASN|po 74.2 71.5 55.7 54.5 90.0 714 67.9 49 47 37.1 37.0 128.7 131.8 98.2 96.1 
ASNI|p1 72.2 72.3 62.1 62.3 90.0 77.0 76.1 48.2 49.1 416 428 124.3 137.0 104.9 105.5 
Nx 88 88 65 65 113 82 82 56 56 41 41 164 164 119 119 

Po = 9.7, p; = 0.9 Po = 0.8, p; = 0.9 Po = 0.9, p, = 0.95 
B=0.1 B=02 B=0.1 BH=02 B=0.1 p=0.2 

k 8 12 6 8 16 20 12 16 16 20 16 20 
act 0.058 0.041 0.066 0.047 0.054 0.045 0.059 0.047 0.058 0.048 0.051 0.041 
Bact 0.029 0.039 0.059 0.085 0.038 0.046 0.094 0.110 0.039 0.040 0.106 0.108 
ASN|/o 28.2 25.8 24.9 21.2 56.9 56.2 44,7 43.3 61.6 60.8 46.4 46.9 
ASN|pi 26.3 25.9 25.2 23.1 55.4 56.4 47.8 48.4 58.8 60.3 51.6 52.7 
Nx 27 27 20 20 65 65 48 48 70 70 51 51 
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Table 3.8 Admissible results for the simulated 6, 9 and / for a = 0.05. 


p=0.1 p=0.2 

Po 6 k Po 6 k 
0.5 0.1 50 0.5 0.1 20<k<50 
0.5 0.15 20<k<50 0.5 0.15 20 
0.5 0.2 12<k<16 0.5 0.2 12<k< 16 
0.6 0.1 20<k<50 0.6 0.1 20<k<50 
0.6 0.15 20 0.6 0.15 12<k<16 
0.6 0.2 12<k<16 0.6 0.2 12<k<16 
0.7 0.1 20<k<50 0.7 0.1 16<k< 20 
0.7 0.15 12<k<16 0.7 0.15 8<k<12 
0.7 0.2 8<k<12 0.7 0.2 6<k<8 
0.8 0.05 50 0.8 0.05 20<k<50 
0.8 0.1 16<k< 20 0.8 0.1 12<k<16 
0.8 0.15 8 0.8 0.15 6<k<8 
0.9 0.05 16<k< 20 0.9 0.05 16<k< 20 


Table 3.9 Empirical ASN in dependence on p in Example 3.26. 


p 0.60 0.65 0.70 0.75 
ASN(p) 89.98 110.094 115.01 89.98 


| Nix 


Figure 3.12 ASN graph of Example 3.26. 
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Figure 3.13 Empirical power function of Example 3.26. 


As it is shown in Figure 3.12, the maximum of the empirical ASN function lies 
between the values of the hypotheses, but it is smaller than ng,. The empirical 
power function is plotted in Figure 3.13. 

The examples show that the optimal k-values can be found in relatively large 
regions and the risks of the second kind are conservative. Therefore, Rasch and 
Yanagida (2015) have developed tables where the user can see which value k has 
to be chosen and how the nominal risk of the second kind has to be increased to 
obtain the wished risk of the second kind as actual risk so that the ASN becomes 
minimal. 


3.7. Remarks about Interpretation 


At the end of statistical tests, we decide on one of two possibilities, namely, for 
accepting or for rejecting the null hypothesis. A confidence interval K(Y) covers 
the unknown parameter of a distribution with a certain probability, and the tests 
k(Y) are connected with risks, with probabilities for wrong rejection (risk of the 
first kind) or wrong acceptation (risk of the second kind) of the null hypothesis. 
Concerning the mathematical theory no questions arise. But practical applica- 
tions need some clarification. What can be stated about a realised confidence 
interval K(Y), and how we assess the value k(Y) of a critical function k(Y) that 
leads in the non-randomised case either to the acceptance or to the rejection 
of Ho? 

Probability statements can never be made for a realisation of K(Y) or after 
accepting H,. Such probabilities relate to the method of constructing confi- 
dence intervals and tests, but not to their realisations. 

It would be nonsense to say that a realised interval K(Y) = (4.756; 29.560) con- 
tains the parameter o” with a probability 0.95. As we know the realisation of this 
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parameter is an unknown but although fixed non-negative real number. Hence, 
this parameter value either lies in K(Y) or not. For example, no serious scientist 
would claim that the number 2 with probability 0.95 is in the interval (4.756; 
29.560). Nevertheless, there are some books about applied statistics, but above 
all printouts of some statistical software tools, where you can read that a realised 
interval contains an unknown parameter with a calculated probability. There- 
fore it is no miracle that some students and user repeat this nonsense. 

Analogously it is completely wrong to state after rejecting a null hypothesis 
based onarandom sample that this decision is wrong with probability a. Evidently 
this decision is either right or wrong. However, it is correct to say that the decision 
is based on a procedure that supplies wrong rejections with probability a. 

Let us turn to a further example. Ifa single die is thrown, then an even number 
is obtained with probability 0.5. Assume that the number 3 was thrown. It is 
nonsense to claim that the number 3 would be even with probability 0.5. Per- 
haps this simple example is helpful to realise that probability statements con- 
cerning realised test results or confidence intervals make no sense. 

Therefore the user is recommended to choose a (and f, respectively), small 
enough that a rejection (or acceptation) of H, or a realisation K(Y) let the user 
behave with a clear conscience as H, would be wrong (or H, right) or as would 0 
lie in K(Y). But there is also an important statistical consequence: if the user has 
to conclude during his/her investigations a lot of such decisions, then he/she 
will wrongly decide in about 100 a (and 100 f, respectively) percent of the cases. 
This is a realistic point of view that can be essentially confirmed by experience. If 
we move in traffic, we should realise the risk of one’s own and other people’s 
incorrect actions (observe that in this case @ is considerably smaller then 
0.05), but we must participate, just as a researcher must derive a conclusion 
from a random experiment, although he knows that it can be wrong. Moreover, 
it is very important to control risks. Concerning the risk of the second kind, this 
is only possible if the sample size is determined before the experiment or if it is 
sequentially tested during the experiment. 

The user should take care not to transfer probability statements to sin- 
gle cases. 


3.8 Exercises 


3.1 Let Py and P, be the rectangular distributions acting over the intervals 
(0,1) and (1,2), respectively. Test the hypothesis Hp that the distribution 
Po occurs against the hypothesis H,, that the distribution P, occurs taking 
one observation y. The following tests are proposed for given a € (0, 1): 


more {t for y € (0,1) orn { for y € (a1) 


1 for yé (1,2) 1 for y€ (0,a)U(1,2) 


3.2 


3.3 


3.4 


3.5 


3.6 


3.7 
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a) Show that these tests are most powerful a-tests. 

b) Present these tests — if possible — in the form (3.5). Does the result con- 
tradict the statement (3) of the Neyman—Pearson lemma (Theorem 
3.1)? Is one of these tests randomised? 


Test for the geometric distribution with the probability function 
P(y=k) =p*(1-p), k=1,2,..50<p<1 

the hypothesis Ho : p = po against Hy: p = p; (Po # Pi) based on a random 
sample Y= (yj, yo, ... Yn) 


a) Formulate the most powerful a-test using the test statistic y. 

b) Determine the numbers c,,7(Y) and f for this test assuming that 
n=1,a=0.05,p 9 =0.5, p; =0.1 (see Section 3.2). 

Test for the Poisson distribution with a random sample Y = (1, ya)... , Yn)" 

of size n = 10 the hypothesis Hp:A=A)=0.1 against the hypothesis 

H,:A =A, = 1. Determine the most powerful a-test for a = 0.01 and calcu- 

late for this test the risk £ of the second kind. 


Let the lifetime y of certain industrial instruments be exponentially dis- 
tributed with the density function fly) = Je”, y > 0. Based on a random 
sample Y= (y1, yo, «.-, Jn)! the hypothesis Ho : A = Ag is to be tested against 
the hypothesis Hy :1=/,, 49 #4. Determine the most powerful «-test. 


Formulate and prove a modification of Theorem 3.8 concerning the UMP- 
test Hp: 0< 09 against H,:0= 0, > 0, supposing an antitone likelihood 
ratio (instead of an isotone one) of the distribution family belonging to 
the sufficient statistic M = M(Y). 


Determine under the assumptions of Exercise 3.4 


a) The UMP-a-test of Ho: A < Ao against Hy:1 =, > ro, 
b) The power function of this test 
c) In the case Ay = 0.01, a = 0.05 the test result based on the sample 


170.8; 211.7; 73.5; 52.1; 11.8; 22.1; 167.6; 26.7; 77.5; 17.3 


Let Y=(91,¥2 ..,Yn)’ be a random sample whose components are 
uniformly distributed in the interval (0, 6). With the notation ,) = 
max(y1, ...5 Vn), let 


1 for y(n) 2¢ 
k.(Y) = Sieh , (c>0) 


be the critical function of a test for the pair {Ho, Ha} of hypotheses. 
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3.8 


3.9 


3.10 


a) Determine the power function z(0) for k,(Y) and show that z(0) is 
monotone non-increasing in 0. 


1 1 
b) Test Ho: 0< 5 against Ha: 0 > 5 with the significance level a = 0.05. For 


which value c is k,(Y) an a-test for {Ho, Ha}? 

c) Sketch the power function of the test in (b) using 1 = 20. Is this test 
unbiased? 

d) Which number x has to be chosen, so that the test in (b) has for 0 = 0.6 
a risk 0.02 of the second kind? 


Let the components of the random sample Y = (yy, yo) «+. Yn) satisfy a 
Rayleigh distribution with the density function 


y x 
f(y,8) = Ge 2#,y>0, O>0. 


The hypothesis Ho: 8 < @ = 1 is to be tested against Ha: 0 > Op. 


a) Show that there is a UMP-a-test for {Ho, Ha}, and determine for great 
nwith the help of the central limit theorem approximately the critical 
function of this test. 

b) Determine for great 1 approximately the power function of this test. 


Let the assumptions of Exercise 3.4 be fulfilled. 


a) Show that there is a UMPU-a-test for the hypotheses 
Ho :A=Ao, Ha: AF Ao. 

b) Determine for 1 = 1 the simultaneous equations whose solutions 
Cia (i = 1, 2) are necessary to describe the critical function of this test. 

c) Show, for example, in the case 49 = 10, a = 0.05, n = 1, that the cor- 
responding test for a symmetric partition of a is biased by calculating 
the power function of this tests at A = 10.1. 


Let p be the probability that the event A happens. Based on a large 
sample of size n, where this event was 1,,-times observed, the hypothesis 
Hp : p = po is to be tested against the hypothesis H,:p # po. 


a) Construct an approximate UMPU-a-test for these hypotheses by 
applying the limit theorem of Moivre—Laplace. 

b) A coin with head and tail on its faces was tossed 10 000 times, where 
the tail appeared 5280 times. Check with the help of (a) if it is justified 
to assume that the coin is not fair (that head and tail do not appear 
with the same probability). Choose a = 0.001. 

c) A dice is tossed 200 times, where the (side with) number 6 occurs 
40 times. Is it justified to claim (with a significance level of 0.05) that 


1 
this dice shows the number 6 with the probability p = ra 


3.11 


3.14 
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The milk-fat content of 280 randomly chosen young cows of a cattle 
breed was determined. The average value in the sample was y = 3.61%. 
We suppose that the random variable y modelling this fat content is nor- 
mally distributed. 


a) Let the variance of the fat content y be given, say, o° = 0.09. Test the 
hypothesis that the average milk-fat content of young cows of this 
race is fo = 3.5% against the alternative that it is larger than 3.5%. 
Choose a = 0.01. 

Determine the probability that deviations of the population mean py 
of 0.05% fat content from fig = 3.5% imply the rejection of the null 
hypothesis in (a). 

Which deviations 6 between the value yw and the reference value fig = 
3.5% imply in the test from (a) that the null hypothesis is rejected with 
a probability larger than 0.9? 

Let the variance o” of the fat content y be unknown. From a sample of 
size 49 an estimator s” = 0.076673 for the variance was calculated. 
Test the hypotheses in (a) using the significance level a = 0.01. 


= 


fe) 
WH 


sor 


The producer of a certain car model declares that the fuel consumption 
for this model is in the city traffic approximately normally distributed with 
the expectation y = 7.5 1/100 km and the variance 6” = (2.5 1/100 km)’. 
These declarations are to be tested to satisfy the interests of the car buyers. 
Therefore the fuel consumption was measured for 25 (randomly chosen) 
cars of this model moving in the city traffic (of randomly chosen cities 
worldwide). Here are the results: 


Average fuel consumption: 7.9 1/100 km 
Sample variance (3.2 1/100 km)? 


Test the statements of the car producer separately for both parameters 
choosing a = 0.05. 


The milk-fat content of Jersey cows is in general considerably higher than 
the one of black-coloured cows. It is to be tested whether the variability of 
the fat content is for both breeds equal or not. A random sample of n; = 25 
Jersey cows supplied the estimator s?=0.128, while an independent 
random sample of m2 =31 black-coloured cows led to the estimator 
s3 = 0.072. The fat content is supposed to be in both breeds normally dis- 
tributed. Test for a = 0.05 the hypothesis Ho : of = 03 against the alternative 


a) Hy: 07> 035, 


b) H4: 07 403. 


Consider a random sample Y = (yj, y2, ...,¥,)’ whose components are 
uniformly distributed in the interval (0, 0), 9 € R* Determine the sample 
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3.16 


3.18 


size n so that the random interval (y1) , ¥(u)) of order statistics covers the 


0 
parameter A with the probability 0.999. 


Let Y = (91, ya, «-.,¥,)’ be a random sample whose components are uni- 
formly distributed in the interval (0, 09) with unknown 69. Confidence 
intervals K(Y) are to be constructed with respect to 6 and with the con- 
fidence coefficient 1-a. They are to be of the form 


K(Y) = |¥(ny01 (1) Hny€2(@2) | 


1 
where @= + 09305 @,Q2 < 5 holds and c;(a1),co(a2) are suitable 
constants. 


a) Construct three confidence intervals K,(Y) for a,a <4 arbitrary; 


2 
K,(Y) for a, =0, a =a; and K3(Y) for aj=a, az.=0. 

b) Calculate the expected length 26; of the confidence intervals K;(Y) 
with i = 1, 2, 3 from (a). Which interval has the smallest expected 
length? 

W(9, Oo) = P(@ € K(Y) | Oo) is called the characteristic function of the 
confidence estimation K(Y). Calculate the corresponding functions 
W(0, @o) of the intervals K;,(Y) with i = 1, 2, 3 given in (a), and sketch 
these functions for 09 = 10, m = 16, a = 0.06 and a, = 0.04 in the case of 
the interval K,(Y). Which confidence intervals are unbiased? 


fe) 
Ne 


~ 


Determine the one-sided UMP-(1 — a) confidence intervals with 
respect to 1 supposing the conditions of Exercise 3.4. 

b) Determine the realisations of these confidence intervals based on the 
sample from Exercise 3.6 (c) using a = 0.05. 


a 


Let the assumptions of Section 3.4.2 be satisfied. 
2: 
Determine das UMPU-(1 — a) confidence interval for the quotient 4 of 
2 


variances. 


In a factory certain pieces are produced in large series. The probability p 
(0 < p <1) that a peace in a series is defect is unknown. The hypothesis 
Hp : p = pois to be tested against the alternative H, : p = p; where po # P1. 
We want to use the following sequential test. Let mo be a fixed natural 
number. We successively select independent pieces for the sample. If 
the kth piece (k< no) is defect, then Ho is rejected. But if all mg pieces 
are intact, then Ho is accepted. 


a) Determine the power function of this test. 
b) Calculate the average sample size E(u | p). 
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c) Calculate for po = 0.01, p; = 0.1, M9 = 10 the risks a, 6 and the expecta- 
tions E(v | p,) with i = 1,2. 


Let the components of the random vector Y= (yj, y2,...)" be mutually 
independently distributed as N(u, 0°). The null hypothesis Ho : 1: = Uo is 
to be tested by a 0.05-t-test. Which minimal sample size has to be chosen 
for a risk 0.1 of the second kind if 
a) The alternative H, is one-sided and the practically relevant minimum 
1 
difference is 6= 4c 


b) The alternative H, is two-sided and the practically relevant minimum 


1 
difference is 6= ao: 
Hint: Use the approximate formula. 


Let the components of the random vectors Y; = (yj, y;2) «-.)’ ;i=1,2 be 


independently distributed as N (1,07). It is unknown whether of = 03 


holds or not. The null hypothesis Ho: 41 =o is to be tested with a 
0.05-t-test. 


b) Which test statistic should be used? 

c) Which minimal sample size has to be chosen for a risk 0.1 of the sec- 
ond kind if 
i) The alternative H, is one-sided and the practically relevant mini- 


1 
mum difference is 6 = —o? 
ii) The alternative H, is two-sided and the practically relevant min- 


1 
imum difference is 6 = at 


Hint: Use the approximate formula. 


We consider two independent random samples Y = (arava Yim) > 


Y= (Vopr) Yon) " where components y;; are supposed to be distributed 
as N(u;, o7) with i = 1, 2. The null hypothesis 


Ho: fy = My =H, 07,65 arbitrary 
is to be tested against 


Hy: AM, 6;,05 arbitrary. 


Construct a UMPU-a-test for one-sided alternatives in the 


aaa) 
case Oj = 05. 
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Linear Models - General Theory 


4.1. Linear Models with Fixed Effects 


The theory of linear statistical models plays an important role in the applica- 
tions. Mainly the standard methods of analysis of variance and regression anal- 
ysis have become firmly established in evaluating biological and technological 
experiments. 

In this chapter we introduce the general theory concerning methods of 
analysis of variance and regression analysis with fixed effects. In the following 
QC R” denotes a p-dimensional linear subspace with p <n called parameter 
space, and @ € Q denotes a parameter vector with 1 coordinates 0,(i = 1, ...,7). 

Further, let Y be an n-dimensional random variable (a random vector) with 
components y,(i=1, ...,) and realisations Y from the n-dimensional sample 
space R”. Finally, let e be an n-dimensional random variable with E(e) = 0,, 
var(e) = o°V, where V is a symmetric and positive definite matrix of size 
(n, m) and rank n. For constructing tests and confidence intervals, we will 
later suppose that e (and hence also Y) are n-dimensional normally distributed 
(satisfy n-variate normal distributions). 


Definition 4.1 The equation 
Y=0+e (4.1) 


including the constraints 6 € Q, E(e) = 0,, var(e) = o°V is said to be a general 
linear model (with fixed effects). If @ C Q is a linear subspace of Q, then the 
hypothesis Hp: 6 € @ is called linear. 

The definition of a linear hypothesis obviously implies that H,:0¢ @ is no 
linear hypothesis, since Q \ @ is no linear subspace of 2. Namely, linear combi- 
nations of elements in this set can, for example, belong to w. W.l.0.g. we assume 
V=lI,, This is indeed no restriction of generality if V is known as we will see. 
Since V is symmetric and positive definite, there is a non-singular matrix P with 
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V=P’P. We introduce the new variable Z = (P')"'Y that has the expectation 
A = E(Z) = (P") 1E(Y) = (P")"! @ and the variance 


var(Z) = (P™) "E [(y-a)(v-0)"|P™! = (P?)“var(Y)P7! = 62h. 
The model 

Z=A+e* 

e* = (PT) te (A€Q*), E(e*)=0,, var(e*)=07l, 
has therefore the form (4.1) including corresponding constraints. Since Q is 
mapped by 4 = (P')! 6 onto a set Q* with dim(Q) = dim(Q*), Q* is again a 
p-dimensional linear subspace. Analogously the matrix (P')' maps w onto 


w* where dim(@) = dim(@*) and w* C @ so that the linearity of the hypothesis 
is also conserved. Hence we will use V = J,, for linear models. 


4.1.1 Least Squares Method 


First we want to estimate the parameter vector using the least squares method 
(LSM) (compare Section 2.3.2). An estimator @ for 0 by the LSM is an estimator 


where its realisations @ fulfil 
a “ inf 
lel?=¥-@1? = ge gh Y-all. (4.2) 


The following theorem is known from approximation theory. 


Theorem 4.1 A realisation @ of the LSM @ satisfying (4.2) is the orthogonal 
projection of Y (the realisation of Y) onto Q. 


Proof: Let cj, ..., c, be an orthonormal (vector) basis of Q. Introducing numbers 
(scalars) k; = Y‘c;, the realisation Y can be written in the form 


Y= Ske +Y- S casees, c= Ske 
i=l i=l i=l 


Because of ci b = 0 the representation Y = c + b supplies a decomposition of Yin 
the sum of two orthogonal vectors c € Q, b € Q". 

This decomposition is unique. Assuming that there is another decomposition 
Y=c'+b",wegetc+b=c'+b* orc-c' =b* —b.Sincec-c* € Qandb* -be 
Q", it follows c - c* = b* - b = 0. Hence, the uniquely determined vector c is the 
orthogonal projection of Y onto Q. 


Finally we have to show that c= 0. Taking Y - = Y-c+c-0 into account, it 
follows 


||¥ All” = |¥ -cl|? + |Je-A||? + 2(¥-0)7 (c-8). (4.3) 


Linear Models - General Theory 


Since c-@ € Q and b = Y- c € Q", the third summand on the right-hand side of 
(4.3) vanishes, and this side attains its minimum for c = 0. 


Theorem 4.2 The LSM vector @ satisfying (4.2) can be obtained from a rea- 
lisation Y of the random vector Y by the linear transformation 


@=AY (4.4) 
with a (symmetric) idempotent matrix A of rank p.' 
On the other hand, if A is an idempotent matrix of size (u,n) with rank p, 
then the linear transformation AY with Y € R” realises the orthogonal projection 
of R” onto a p-dimensional vector space. 


Proof: First we show (4.4), where A is supposed to be idempotent with rank 
p (rk(A) = p). Considering the proof of Theorem 4.1, it is 


P P 
0= So kici = Sa¥ "a: (4.5) 
i=l i=l 
Because of Y‘c;=c} Y (transposition rule), we get 


O= (Giceaty) (Gite) Ye 


With the notations C = (cj, ..., cy) and A = CC", the vector @ becomes the form 
(4.4). Observing A‘ =(CC')' =CC' =Aand remembering the vectors c; to be 
orthonormal (compactly written as C'C= I,), it follows A'A=CC' CC! =A7=A, 
that is, A is idempotent. 

Moreover it is rk(A) = rk(C) = p. 

Now we prove the second part of the theorem. Let A be an idempotent matrix 
of size (n,n) and with rank p. For each such matrix there is an orthogonal matrix 
C so that CTAC = I, ® On-pn-p- Therefore A can be written as A = (cj, ..., Cy) 
(cy, --5 e) where the column vectors c; of C (i = 1, ..., p) represent a basis of a 
p-dimensional subspace of R”. 

If we intend to estimate parameters in linear models for unknown distribu- 
tion, then the LSM is usually applied. A justification for this approach is pre- 
sented in the next theorem. 


Theorem 4.3 Gauss—Markov Theorem 

Let L = a’ @ bea linear form in the parameter vector 0 € Q of model (4.1) 
with range R’. Then there exists for a’ 6 in the class of all linear estimators with 
bounded mean square deviation E(SD) a uniquely determined estimator 


1 In the following, we omit the attribute ‘symmetric’, since all idempotent matrices in this book will 
be symmetric. 
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with minimal E(SD); this estimator is the LSM that has the form a‘ AY with A 
described in Theorem 4.2 (see (4.4)). 
Proof: Let t'Y be a linear estimator for a'@ = L. We consider the mean square 
deviation (MSD) 

E(SD) = E(t'Y-a™0)” =E(t™Y-t70+170-a70)’, 
which can be written with E(Y) = @ in the form 

E(SD) =var(tTY) + (¢70-a™@)’. 


E(SD) is only then bounded for all 6 € Q, if t'@ — a‘ =0 for all these 0. Hence, 
the class of all linear estimators for a'@ with bounded E(SD) is described by the 
equation t'0 - a'@ = 0. The matrix A in (4.4) realises the orthogonal projection 
of R” onto Q. Therefore it is AO = 0, and the class of linear estimators with 
bounded E(SD) is characterised by At = Aa if (t" — a‘) AO =0 for all 0, and con- 
sequently (t'-a™)A = 0! is taken into account. This class of estimators satisfies 


E(SD) = var(t"Y) +t" to’. 


Now we have to determine ¢ so that E(SD) is minimised under the condition 
At = Aa with A from (4.4). We write 


t™t = (t+ At—At)' (t+ At—At). 
Since A is idempotent, we get 
t™t = (At)" (At) + [(n—A)t] (In -A)t 
and because of At = Aa also 
t™t = (Aa) (Aa) + [(I,—-A)t]' (Ip-A)E. (4.6) 


The functional t*t in (4.6) and consequently E(SD) are minimised if the second 
summand of the right-hand side in (4.6) vanishes, that is, if t = At = Aa. This 
supplies the uniquely determined estimator a" AY for a’0. 

Many variants of the Gauss—Markov theorem are based on the class of linear 
unbiased estimators for a'@. The LSM under all these estimators is the one with 
minimal variance. 


Example 4.1 Let p = 1 in the model equation (4.1) so that (4.1) can be 
written as 
Y=1,0,+e (y,=0, +e, i=1,...,n). 


Here 1, is the vector whose coordinates are all 1 (see also Appendix A). Assum- 
ing —0o < 0; < co the parameter space Q has the dimension 1. The estimator of 
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6, according to the LSM is obtained by putting the derivative of e'e = f (0,) to 0 
as unique solution 6, = 0, = y of the equation 


— O(Y=1n01)" (Y= 1101) _ 


D= = 2 i 2nd, =0 
001 Dene 


(which produces a minimum, since the second derivative of f (0) is positive). 
Therefore the parameter vector is estimated by @ = 1,,0,. The parameter space 


Q has the orthonormal basis c, = . Using the notations of the 


7) 


proofs to Theorems 4.1 and 4.2, we get 


dy 
n eee n 
C=q, CC l=clc=A= 
oo 
n eee n 


The orthogonal decomposition of Y given in the proof of Theorem 4.1 is realised 


with k; = = and c= re Additionally in this special case, the general state- 
ment c=6=AY holds. The variance of 6 = 1" is 
PY) 2 
no on 
var (0) = Ao’ = : 
pe 2 
no on 


It is easy to show that A is idempotent and has rank 1 (see also Exercise 4.5). 


Theorem 4.4 If 6 is the LSM for 0 in (4.1) with var (6) =o", then 


1 5 1 
ss \|¥ -0\|? = Y"(GAA\Y (4.7) 
n-p n-p 


‘ ‘ A 2 
is an unbiased estimator for o~. 


Proof: We have to show 


E||| ¥-8|?| =0?(n-p). (4.8) 
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Regarding 6 = AY and the idempotence of both A and J, — A, we obtain 
E|\| ¥-6\?| =E[¥"(,-A)¥] = E(YY) -E(¥TAY), (4.9) 
Now 
E(Y'BY) =tr(BE) + "By 
if E(Y) = 0 = pw and var(Y) = Y = o°,,. This implies with B = I, 
E(Y'I,Y) =o°tr(I,) +0'0=0'n+0'0 
and with B = A 
E(YTAY) =0°tr(A) + 0°AO=07p +00. 


The difference of both equations leads to (4.8) that finishes the proof. 


4.1.2 Maximum Likelihood Method 


In this section in addition to the linear model conditions, that is, (4.1) and 
the constraints, we suppose that the random vector e in (4.1) follows an 
n-dimensional normal distribution N(0,, o7J,). Then Y is distributed as 
N(6, o°I,,). Now we look for a MLE, an estimator for 0 according to the maxi- 
mum likelihood method. The likelihood function has the form 


: (4.10) 


0€Q, (67,07) EQ", Q* =Qx (0,00). 
According to this method we get MLE for @ and o*. We start with determining 
the log-likelihood function in (4.10): 
2, 1 
20? 


Now we want to maximise In L under the constraint AO = 0 (i.e. 90 € Q), where 
A is the matrix of orthogonal projection of R” onto Q. We denote the values 


(Y-0)'(Y-8). (4.11) 


inka —” Inae—" Ino 
2 2 


maximising L and In L, respectively, by 6 and 6”. We use the Lagrange method. 
Introducing the Lagrange multiplicator 1 for AO = 6 and deriving the modified 
function (4.11) partially according to J, @ and o” we get after putting the deri- 
vatives to zero and replacing the variable 6 and o” by @ and 6? the simultaneous 
equations 
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which have unique solutions supplying the maximum, since the matrix of 
second partial derivatives is negative definite. If we use the random variable 
Y in the solutions instead of its realisation Y, we find the MLE 


1 ~ 
a=" ||¥-9|’, (4.13) 
0=AY =0. (4.14) 

The MLE @ is identical with LSM 6. The MLE &? is biased, but consistent. 


Theorem 4.5 If Y is distributed with the likelihood function (4.10) and 
dim(Q) = p, then c'@ = c16 is for each vector c = (ci, ..., C,)'» with real numbers 
c; the uniformly variance-optimal unbiased estimator (UVUE) for c'@, ands’ in 


(4.7) is a UVUE for o” (compare Definition 2.3). 


Proof: The assertion follows from Theorem 2.4 in relation to Example 2.4, 
because E (c7) =c?@ and E(s*) = 0” as well as c'@ ands” are complete sufficient 


statistics. 


4.1.3. Tests of Hypotheses 


The linear hypothesis Ho : 9 € w with the (p — q)-dimensional linear subspace 
w C Q is to be tested against the alternative hypothesis 06 ¢ w. We design a 
likelihood quotient test by introducing 


_ SUP peal (9; o| ie) 


ss Ue 4.15 
SUD gcgl(0,02|¥) ee) 


where Y is again supposed to be distributed as N(0, o7/,,). After passing to ran- 
dom variable Q itself or a monotone function of Q considered as a function of 
Y are to be used as test statistic. We denote such values of o” and @ that max- 
imise the function L from (4.15) given in (4.10) on w by 6” and 0. Additionally, 
let B be the idempotent matrix, which orthogonally projects R” onto . After 
passing from the realisations to the random variables analogously to (4.13) 
and (4.14). we get 


Dice 110 
6° =—||Y-6|)’, (4.16) 
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y 


6 =BY. (4.17) 
Regarding 
2 v2\-5 1 ||\¥-9\?? 22\ 5-8 
supL(0,0°| Y) = (26°) 2 exp | - =-———"_| = (228°) 2e°2 
0 21 4 \|2 
- bard] 


and 


n 
supL(0,07| Y) = (2267) 2e°2 
0EQ 
after passing to random variables, the likelihood ratio (4.15) becomes 


6\2 
= (2) : 


We consider a monotone function F = F(Q) of Q, namely (in the random form), 


2 -p Y™(A-B)Yn- 
FulQun ajo = oe (4.19) 
q Y'(L,-A)Y @ 


n 
I¥-aAYIP]?_ ee 
ly-ay(P| ~ ly-B¥i 


(4.18) 


where g = rk(A — B) to allow calculations on the base of tabulated distributions. 
Theorem 4.7 clarifies the distribution behind (4.19). 

We repeat without proof a theorem from probability theory, which is needed 
here and also later. 


Theorem 4.6 Theorem of Cochran (1934) 

If Yis distributed as N(1, y, I), then the positive semi-definite quadratic forms 
Y'A,Y (i = 1, 2, ..., k) of rank 1; are independently of each other distributed as 
CS(n;, 4;) with the non-centrality parameters A; = (1, “)"A; (1, /) if and only if at 
least two of the three following conditions are fulfilled: 


1) All A; are idempotent. 


2) Sk, A; is idempotent. 
3) A; Aj = 0 for all i Fj. 


Corollary 4.1 If Yis distributed as N(1, 4, J,) andif YTY = )~*_,YTA(Y, then 
the quadratic forms Y'A;Y(i = 1, 2, ..., k) of rank n; are mutually independent 
distributed as CS(n;, A;) with n; = rk(A,;) and the non-centrality parameters 
Ai = (In #)"A; (In 4) if and only if either 


e all A; are idempotent 


or 
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e A; A; = 0 for all i Fj 
or 


© Oh tk(Ai) =rk( Sk Ai) =n 


Corollary 4.2 If Y is distributed as N(1n py, oI,) and if YY = yak TAY 
with n; = rk(A;), then each of the three conditions of Corollary 4.1 is necessary 
and sufficient for the fact that the quadratic forms (1/07) Y'A,Y (i = 1, 2, ..., k) 
are distributed independently of each other as CS(n;, /;) with the non-centrality 
parameters J; = (1/o*) (1, 1) "A; (1p p). 

Now we come to the announced statement about the distribution of F in (4.19). 


Theorem 4.7 Let Ybe distributed as N(O, o7/,,) and let A and B be idempotent 
matrices that project R” orthogonally onto Q and onto w C Q, respectively 
(where rk(A) = p, rk(B) = p — q). Then F in (4.19) is distributed as F(g, n — p, A) 
with non-centrality parameter J = (1 /o*)0'(A — B)@ and the degrees of freedom 
qand n — p. 


Proof: Since A is the orthogonal projector of R” onto the p-dimensional sub- 
space 2 and B the orthogonal projector onto the (py — q)-dimensional subspace 
@ CQ, we get AB = BA = B. Hence, I, - A and A - B are idempotent. With the 
notations A, = 1, - A, A =A — Band A3 = B, the conditions of Theorem 4.6 are 
satisfied. Regarding Corollary 4.2 to this theorem, (1/o*)Y' (I, — A)Y and (1/o”) 
Y'(A - B)Y are mutually independent distributed as CS(m - p) and CS(q, A), 
respectively, with the non-centrality parameter 4 = (1/ o”)0'(A — B)@, which sup- 
plies the assertion. 
Using results in Section 4.1.1 we can show 


E|Y"(In-A)Y]=0°(n-p), EY" (A-B)Y]=0°q+0°A. 


If the interim results for calculating F are to be represented in a clear way, an 
analysis of variance table often is used (see Table 4.1). 

If Ho is true, the non-centrality parameter becomes 4 = 0, and F is centrally 
F-distributed with degrees of freedom q and n - p. Hp is rejected, if 


F >F\-q(q,n-p) =F(g,n-p|1-a), 
where the quantile F,_,(g, m — p) is chosen so that 
maxP{F > Fi_«(q,n-p)|0€o}=a (4.20) 


is the significance level of the test. The power function is 


p(0,a) =P} GF Mae =P) \ (4.21) 


qF+n-p qF\_,(q.n-p)+n-p 
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Table 4.1 Analysis of variance table calculating the test statistic for the test of the hypothesis 


Hp: 9 €@CQ. 
Mean square 
Source of Sum of Degrees of ee " 
variation squares SS_ freedomdf MS= df E(MS) F 
Total y'y n 
i 1 -pYT(A- 
Null hypothesis y1(4 _ pyy 4 “yT(A-B)Y me; tat put pY°(A-B)Y 
0E€@ q q q Y'(n-A)Y 
1 
Residual Y'U,-A)Y n-p ae o 
Alternative Y'BY P-4q 
hypothesis 0 ¢ w 


It can be shown (see Witting and Nolle, 1970, p. 37) that this test is invariant 
with respect to the group of affine transformations in R”, thus the test problem 
is also invariant. The F-test is under all invariant tests with respect to these 
transformations a uniformly most powerful a-test. 

Each linear hypothesis can be written in a basic form by using a suitable trans- 
formation of the sample space. 


Definition 4.2 A linear hypothesis 6* € w according to Definition 4.1 is said 
to be in canonical form if 


@€Q means O,,=--=9,=0 


and 


0* €@ means 0, = +++ =O, =6,,, =-: = 97, =0. 


Theorem 4.8 Each linear hypothesis Ho: 8 € w can be transformed by orthog- 
onal projection of the model equation (4.1) into canonical form so that 


Y"(A-B)Y=zi+---+22, Y"(I,-A)Y=25,,+ 0 +2, 
is satisfied, and the distribution in (4.19) remains unchanged. 


Proof: Let P be an orthogonal matrix of size (4,1). We put Y = PZ and 0 = Pé™. 
W.l.o.g. we choose P so that 


O O O 
P™(A-B)P= & °) P'BP=| O Iy-q O 
O O O 
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and 


O O 
P'(I,-A)P = 
Ot 


which is always possible. For simplicity the sizes of the null matrices are omitted 
here. Then (4.1) is transferred into 


Z=0" +e where Z ER", €Q,e* = Pl eZ" = (Z1..42n) 


As B is also P’BP the orthogonal projector of R” onto a (p — q)-dimensional 
subspace w”, and that means 


AL 


0 for i=1,...,.9,p+1,....n 
Ho: 6; = : 
arbitrary for i=q+t1,...,p. 


Besides we find 
¥"(A-B)Y =Z'P"(A-B)PZ=2+--- +z, 
and 


¥"(,-A)Y =Z'™P'(I,-A)PZ =z) j++ +22. 


n 


The non-centrality parameter of the numerator in (4.19) is 
1 «2 2 
a= (6; $4 OF ) 


It is equal to 0 if and only if 0," = --- = 6," = 0, that is, if and only if Hp is true. 
According to this Theorem (4.8) can also be applied for testing linear hypoth- 
eses in canonical form. 


Definition 4.3. We understand as linear contrast of the parameter vector 0 a 
linear functional c’ @ with c = (cy, ..., ¢,)' and )~/_ ,¢; = 0. Two linear contrasts 
c}@ and c36 are said to be orthogonal (linear contrasts) if c}co = 0. 


Nowweare able to express the null hypothesis 9 € Q by orthogonal contrasts. Let 
n — p pairwise orthogonal contrasts c/@ (i=1,....2—p) be given that are equal 
to 0. Under this condition the hypothesis Ho, in which q further pairwise and 
to the given c} @ orthogonal contrasts t 0 (j =1,...,q) are also 0, is to be tested 
against the alternative hypothesis that at least one of the additional contrasts t/ 0 
is different from 0. We put C = (cy, ..., ¢,_p) and T = (h, ..., £,). Now the condition 
C'd= 0,,-p defines the p-dimensional null space , that is, C'O= On-p is equiv- 
alent to @ € Q. Correspondingly the hypothesis Hy : C’0 = On-p A T'0= 0, is 
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equivalent to 0 € w. Hence, the hypothesis containing the contrasts can be 
tested with F from (4.19). This test statistic can be rewritten in another form 
as it is shown in the next theorem. 


Theorem 4.9 We consider n - p + g orthogonal contrasts c/@ (i=1,...,n-p) 
and 0 (j=1,...,q). Then the notations C = (cj, ..., ¢,») and T = (t, ..., tg) 
imply C'T =O. Besides C'C = D, and T'T = Dy are diagonal matrices. 

Now let c}0 =0(i=1,...,n - p) and Ybe distributed as N(O, o7/,,). Then the 
test statistic of the linear hypothesis Hh : 0 = 0 for allj = 1,...,¢ (0 € w) can be 


written with the estimator @ as 


n\ 2 
q 1 ( 4T 
pute 21g? (4 6) 
yg YOU ARAY 


(4.22) 


where A is again the orthogonal projector onto Q. 


Proof: The first assertions are evident. Finally we have to show that the term 
Y'(A — B)Y in the numerator of (4.19) has the form 


q 4 Le 
y™(A-B)Y = 3B Gr (x0) , 


where the matrices A and B are the projectors onto Q and @, respectively. 
Then the difference A - B is the orthogonal projector of R” onto the subspace 
@ 1 Q, and we get 


0 = BO +(A-B)0 


for 9 € Q. The columns of T form a basis of w+ M Q, and the columns of P 
in A — B = PP" also form an orthonormal basis of w+ M Q. Consequently a - 
non-singular matrix H exists so that T = PH and P = TH", respectively, as 
well as. A - B = PP’ = T(H'H)"'T" hold. Since A - B is idempotent, it follows 
A-B=T(T'T)'T' and 


Y'(A-B)Y¥ =Y'A(A-B)AY =9'T(T" T) TTY. (4.23) 


This implies the assertion because 7" T is a diagonal matrix. 


4.1.4 Construction of Confidence Regions 


As in the previous subsections, we assume that Y is distributed as N(O, o7/,,). In 
this subsection, methods are presented that can be used to construct confidence 
regions for linear combinations. The condition 0 € Q is also written as C'O = 0, 
c'C=D,. 
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Theorem 4.10 Let Y be distributed as N(0, o7I,,) so that the condition 
C'@ = 0 (@ € Q) of the linear model (4.1) is fulfilled. If C'T = 0 also holds, 
then a confidence region for T'@ with the coefficient 1 - a@ is given by 


1 / Ss W 
= (o'r-0'r) (TAT) (176-170) <Fr_a(qn-p). (4.24) 
qs 
In (4.24) the matrix A is the projector of R” onto Q, s” is the estimator for 0” 
according to (4.7) and q is the rank of T. 
Proof: Regarding (4.19), (4.23) and (4.7) Theorem 4.7, the assumptions above 
imply that the statistic 


F- ai (o'r-o"r) (r™AT) '(7™6-770) 


is centrally F-distributed as F(q, n — p), if E (776) = T'@ is taken into account. 


Hence, the assertion is true. 


Example 4.2 Let T = tbea ( x 1)-vector (i.e. g = 1). Then it follows from the 
Gauss—Markov theorem (Theorem 4.3) that the LSM L of L = t'@ is equal to 
L=t'@=t"AY. We put t’A = a. Because of T'AT = T'AAT = a'‘a, we get 
as a special case of (4.24) with focus on L 

1 


4 2 a 
japan t4) <Fy-a(ln-p)=0(n-p,1-5). (4.25) 


2 


This supplies for L the (1 — a)-confidence interval 


‘E-sljalie(n-p, 1-5),L+slalie(n—p, i-5)|. (4.26) 


4.1.5 Special Linear Models 


Example 4.3 Regression Analysis 
Let X be a (m7 x p)-matrix of rank p < 1 so that Q is in (4.1) the rank space of X 
(R[X] = Q); that is, for a certain # € R’, we have 

0=XB. (4.27) 


Since both X and X'X have the same rank p, the inverse matrix (X'X) 1 exists. 
Then (4.27) implies # = (X"X) 1X76. According to the Gauss—Markov theorem 
(Theorem 4.3) from (4.4), we get the estimator 


B= (X7X) ‘XY, (4.28) 


191 


192 


Mathematical Statistics 


where A is again the orthogonal projector of R” onto Q. Consequently there 
exists a matrix P whose columns form an orthonormal basis of Q so that 
A = PP". 

Since @ is the rank space of X, the columns of X also form a basis of Q. Hence, 
there is a non-singular matrix H with P = XH~ 1 As A = XH_1(H"')"1X? is idem- 
potent, it has to be A = X(X'X) XT. If this representation of A is used in (4.28), 
we obtain 

B= (XTX) UXTY. (4.29) 
With this form of A, the formula for s* in (4.7) becomes 
1 4 1 i 
$= —— |¥-X(XTX)IXTY | = YT (1,-X (XTX) XT), 
n-p n-p 
Now we want to test the hypothesis 
K'p=a (4.30) 


under the assumption that Y is distributed as N(XP, o”J,,), where K" is a (q x p)- 
matrix of rank g and a is a (gq x 1)-vector. The hypothesis (4.30) is according 
to Definition 4.1 in the case a 4 0, no linear hypothesis. But (4.30) can be line- 
arised as follows. We put 


Z=Y-Xc, 0 =0-Xc, y=P-c, 
where c is chosen so that K'c = a. Considering the linear model 
Z=' +e (4.31) 


with 0* = 0 - Xc = XB - Xc = Xy, the hypothesis Hy : K'B = a becomes the linear 
hypothesis 


Hy: K'y=K"B-K*'c=0,. 


Now Hp: K'y = 0 can be tested for the model equation (4.31) using the test sta- 
tistic (4.19) with inclusion of formula (4.23). The test statistic has the form 


BIO Ty PZ np 
OP SA\Z a 


where T" is as in Section 4.1.3 the matrix occurring in the hypothesis 
Hy:0° €w (C'O*=0AT'O =0). 
The matrix T can be expressed by K' and X. 
Because of 0* = Xy we get y = (X'X) 1X'6" and K'y = K'(X'X) ‘X'6". There- 


fore it is T’ = K'(X'X)'X". The equation K'c = a implies c = K (K' K) Va. 
If besides Z = Y - Xc = Y - XK(K'K)"'a is used, the test statistic reads 
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1 1 


ee (¥-XK(K"K) “'a) "x(XTX) 1K [KT(X™X) : K] “TKT(XTX) EXT (¥-XK(K"K) a) 


ar (¥-XK(K™K) 1a)" (Jn-X(X™X) 1X) (Y-XK(K™K) 1a) 


(K™p-a) ; [KT(X™x)'K] ; (K™p-a) oo ; 
7 Y'(1,-X(XTX) 'XT)Y q 


(4.32) 
since X'[I,, — X(X'X)"1X'] = 0. 
The hypothesis K’/ = a that can be tested by (4.32) is very general. From The- 
orem 4.7 follows that Fin (4.32) is non-centrally F-distributed as F,_,(q, 1 — p, A). 
The non-centrality parameter is 


(KTB-a)"|KT (XTX) 1K “'(KTB-a) 


A= 5 


Oo 


It vanishes, if the null hypothesis is true. 


Example 4.4 Analysis of Variance 
As in Example 4.3 let X be a (1 x p)-matrix, but now of rank r < p. Using (4.27) 
the model equation (4.1) becomes 


Y=Xfre. 


Since the rank of X is smaller than p, the inverse matrix (X'X)~! does not exist. 
Consequently / cannot be uniquely determined from @. The quantities # = Bx 
that minimise 


S = |¥ -Xxpl|? = (¥-Xp)"(Y-Xp) 
are the solutions of the Gaussian normal equations 

X'Xp=XTY (4.33) 
for XP = Y. These equations arises also if the derivative 


os r 
= =2X° XP-2X°Y 
ap P 
is put to 0. (A minimum is reached for 6 = f*, since the matrix of second 
derivatives is positive definite.) 

Let G be a generalised inverse (or also inner inverse) of X'X defined by the 
relation X'XGX'X = X'X. Then a solution of (4.33) can be written as 


p* =GX'Y. 


We will see later that it makes no sense to call #* an estimator for /. Naturally 
Xp* =6 is an estimator for @ because XGX' is in 6 = XGX""* independent of 
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choosing G. Some further considerations need the concept of an estimable 
function. 


Definition 4.4 A linear function q'f of the parameter vector f is said to be 
estimable if it is equal to at least one linear function p’E(Y) of the expectation 
vector of the random variable Y in the model equation 


Y=Xf+e. 


Theorem 4.11 Let a random variable Y be given that satisfies the model 
equation Y = Xf + e with a (” x p)-matrix X. Then it follows: 


a) The expectations of all components of Y are estimable. 

b) If all qi B (j=1,...,k) are estimable functions, then the linear combination 
L= ye 199) B (c; real) is also an estimable function. 

c) The function q'f is estimable iff gq’ can be written in the form q‘ = p'X with 
a certain vector p. 

d) Ifq'f is estimable, then q'/* is independent of the special solution * of the 
normal equations (4.33). 

e) The best linear unbiased estimator (BLUE) of an estimable function q'f is 


q'B =q'B* where f* is the random variable of solutions of (4.33). 
Proof: 


a) If the i-th (coordinate) unit vector is chosen for p in p'E(Y), then E(y,) = 
p'E(Y) arises that is estimable. 

b) qi B =p) E(Y) implies L = ae 9p; E(Y) =p'E(Y) with p=L= ye 19P, - 

c) Starting with E(Y) = Xf and q'f = p'E(Y), it follows q'f = p' XP. Since the 
estimability is a property that does not depend on f, the latter relation must 
hold for all 6. Hence, q' = p'X. On the other hand, if q’ = p'X, then q'f is 
obviously estimable. 

d) We have q'f* = p'Xp* = p'XGX'Y, where G is a generalised inverse of X. 
Since XGX" is independent of the special choice of G, q'f* does not depend 
on the special choice of a solution f* in (4.33). 

e) The Equation (4.33) implies that q7* is linear in Y and also that 


E(q'B’) =q'E(GX'Y) =q'GXx'E(Y) 
is fulfilled. Since Y = Xf + e leads to E(Y) = Xf we obtain 
E(q'B’) =q' GX" Xp. 


Because of (c) we can put q' = p'X so that E(q'f") = p'XGX'f. Regarding 
XGX'X = X this supplies that q'B* is unbiased. We need the equation 
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q GX X=q again in the equivalent form q = X XG q. The variance var(#") 
can be written as 

var(B*) = var(GXTY) = GX" var(Y)XG" = GX'XG'o". 
Therefore we get 

var(q'B*) =q'GX'XG"qo’ =q'GX'X GX'XG"qo =q'Gqo’. 

Sees TE 
q q 

We have to show that the variance of arbitrary linear combinations c'Y of Y 
with E(c'Y) = q'f cannot fall under the just obtained variance above. The 


unbiasedness has the consequence c'X = q' if c'E(Y) = c' Xf is taken into 
account. Now we get 


cov(q"B’,c'Y) =q'GX'XG' qo" =q' Gqo" 
and 
var (q"B*-c'Y) = var(q'B*)+ var(c'Y) -2cov(q'B",c'Y) 
= var(c'Y)-q' Ggo” = var(c'Y)- var(q"p*). 
Since var(q"p* — c'Y) is non-negative, var(c' Y) = var(q'B*) follows. There- 


fore the estimation of q'f is a BLUE. 


The estimability of a linear combination of @ is connected with the testability of 
a hypothesis that is introduced next. 


Definition 4.5 A hypothesis H: K'f = a with f from the model Y = Xf + e is 
said to be testable if the functions k}f are estimable for all i (i = 1,..., q), where k; 
are the columns of K, that is, if K' can be written as P'X with a certain (n x q)- 
matrix P. 


In Definition 4.5 we can also write K = (ky, ..., kg), K'B=(k/f) and 
P= (pj, ..., Pq). If the hypothesis H is testable, then K' p* = a does not depend 
on the choice of the solution /* in (4.33). 

Now we want to find a test statistic for a testable null hypothesis Ho : K'f = a. 
We know that K"/* is an estimator for K'/ that is invariant with respect to /*. 
It is also unbiased, since (because of X = XGX"X) 


E(Kf*) = K™E(B*) =K™GXTE(Y) = K™GX™Xp 
=P™XGX'Xf=P'™XB=K"p. 


We can derive a test statistic for the hypothesis K'f = a similar to Example 4.3, 
where Y is again supposed to be distributed as N(Xf, o7Iy). All conversions 
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leading to (4.32) can be overtaken; only (X'X)"1 has to be replaced by the gen- 


eralised inverse G of X. Hence, instead of (4.32) we get 


n-p 
(Y-XK(K™K)‘a)' (In-XGX") (¥-XK(KTK) 1a) q 


T 
(v ~XK(K™K) a) XGK[K™GK]"'K™GXT (¥-XK(K™K) a) 
F ne 


We have only to show that T’T = K'GX'XG'K = K'GK. Regarding K' = P™X 
and X = XGX'X or X" = X'XG"X", we find indeed 


K™GX'XG'K = P'XGX'XG'X'P=P'XGX'P=K'GK. 


The numerator of F (ignoring the scalar factor at the end) can be rewritten as 
(K'B* - a)"(K'GK) * (K'B* - a) if 


a=K'K(K'K) 'a=P™XK(K'™K) ‘a=P™XGX™XK(K'K) ‘a 
=K™GX'XK(K'K) ‘a 
is considered. Therefore the test statistic of the testable hypothesis K’ f = a reads 


(KTB*-a)' (K™GK) '(K™B" -a) n-p 


F = 
Y" (I,-XGX")Y P 


; (4.34) 


since X'(I,, — XGX") = 0. 

According to Theorem 4.7 the statistic F in (4.34) is non-centrally F-distributed 
as F(q, n — p, A) with degrees of freedom g and n - p and the non-centrality 
parameter 


1 T T T -1 fe 
A= =a (K B-a) (K'GK) (K'B-a). 
If Ho: K'B = a is true, then A = 0 follows. 
Example 4.5 Covariance Analysis 
Often it happens that the matrix X in Example 4.4 contains some linear inde- 
pendent columns. This suggests to represent X in the form X = (W, Z), where W 


is a (m x s)-matrix of rank r < s and Z is a (m x k)-matrix of rank k (with linear 
independent columns). Obviously it is s + k = p. Now it is natural to split also 


a 
p= ( ) so that (4.1) obtains the form 
Y 


Y=Wa+Zyte. (4.35) 


Linear Models - General Theory 


The parameter space Q is the rank space of X, that is, Q = R[X]. If R[W] N 
R{Z] = {0}, then Q is (equal to) the direct sum R[W] @ R[Z] of these two rank spaces. 

In the following, it is supposed on the one hand that the columns in Z are 
linear independent and on the other hand that the columns of W do not linearly 
depend on columns of Z. 

The model equation (4.35) can be considered not only as a mixture of the 
model equations used in Example 4.3 and in Example 4.4 but also as special case 
of the model equation in Example 4.4. We obtain from (4.33) 


é WiW W'Z\ [a wty 
X™Xp" = = (4.36) 
DEW ZZ, a ZYY 
If Gy denotes a generalised inverse of W' W and G a generalised inverse of 


Z'(E, - WGwW’")Z (in the sense we used it before), then a* and y* can be deter- 
mined in (4.36) as 


a’ = G,(WY-W"Zy*) =GyWY-GyW"Zy* =05-GyW'Zy* 
and 
7 =GZ"(I,-WGwW")Y. 
Here a denotes a solution of (4.36) in the case y* = 0. 
Since S = I,, - WGyW’ is idempotent, the matrices SZ and Z'SZ = Z'SSZ 
have the same rank. Because of rk(SZ) = rk(Z) (the columns of Ware by assump- 


tion no linear combinations of columns in Z), the inverse (Z'SZ)"! exists and 
we get 


y= (Z™SZ)'ZTSY =. 


Therefore y* = 7 is (together with the corresponding a”*) not only a special solu- 
tion of (4.36) but also the unique one. Hence ¥ is an estimator for y. As we see, 
y is estimable. Besides qa is always estimable if it is estimable in a model with 
y = 0. The representation 7 = (Z'SSZ) 'Z™SY implies that 7 is the LSM of y in 
the model Y = SZy + e. 

We want to derive a test statistic for the hypothesis Ho: y = 0. If we put 
0 = Wa+ Zy, then Q in (4.1) is a parameter space of dimension 


p=rk(W)+rk(Z)=rt+k. 


The linear hypothesis Ho: y = 0 corresponds to the parameter space w, whose 
dimension is p - q = rk(W) = r. Hence the hypothesis Ho: y = 0 can be tested 
using the statistic (4.19). Let A again denote the orthogonal projector of R” onto 
Qand B the orthogonal projector of R” onto w. Regarding QN w* = RI (I, - B)Z] 
and R[Z] M @ = {0}, we get 


A-B=(I,-B)Z(Z" (In-B)Z)'Z™(In—B) = SZ(Z™SZ)'Z7S. 
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Therefore Y'(A-B)¥=7'Z'SY and Y"(I,-A)Y=Y1(I,-B)Y-7'Z'SY. 
Hence, the hypothesis Ho : y = 0 can be tested with 


: 7 Z"sy n-r-k 
~YTU,-B)Y-j7'ZTsYy k 


(4.37) 


Moreover, F is centrally F-distributed with k and n - r - k degrees of freedom if 
HA is true. 

If the hypothesis Ka’ = a is to be tested with the estimable function K‘a, then 
the test statistic F is applied as in Example 4.4. 


4.1.6 The Generalised Least Squares Method (GLSM) 


Now we again want to consider the case where V = var(e) # J,, with a positive 
definite matrix V. Although it was shown after Definition 4.1 that V = I, can be 
taken by transforming the model, it is sometimes useful to get estimators for 
arbitrary positive definite matrices in a direct way (without transformation). 
We apply the same notations as in the special case (see the passages after 
Definition 4.1). 

If we use the LSM relation (4.2) with the notations 


V=P™P,Z= (PT) 'y, a= (PT), A= (P™) “8, at = (PT) ‘2, 
then we get 
|Z—Al]? = inf ||Z—a\)? 
AE Q* 
and 
|2-alP= ((P)'y—(P8) 78)" (PP) Ty?) 78) 
= (¥-8)'P (PT) *(v-€) = (y-8)'V4(¥-8) 


Analogously to the transformation (4.4), we have 4 = BZ with an idempotent 
matrix B of rank p. It follows 


@=P™B(PT) 'Y (4.38) 


from (P')-'@ = B(P")'Y after multiplying both sides of the equation by P’. This 
corresponds with @ in (4.4) putting A = P’B(P")'. 

Regarding the case in Example 4.3 (0 =X, rk(X) = P), we have 4 = (P') 1XB 
= Xf. Further, analogously as in Example 4.3 we find 


BaP OP) eS (RT) x (xtP" (P") 'x) “'ytpo 
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and finally after introducing random variables and the UVUE of B 
B=(XTV-1X)UXTVY. (4.39) 


If Vis unknown, then (4.39) is often used with the estimator V instead of V, that 
is, we estimate / by the quasi- UVUE 


p= (xtV'x) “xTW Ty. (4.40) 


V is the estimated covariance matrix of Y. If the structure of X it permits (mul- 
tiple measurements at single measuring points), V is estimated from observa- 
tion values used for the estimation of . In (4.40) the estimator # is neither 
linear nor unbiased. 


4.2 Linear Models with Random Effects: Mixed Models 


If in model equation (4.1) at least one component of 6 is random and at least one 
component is an unknown fixed parameter, then the corresponding linear 
model is called a mixed model. Up to now the theory of mixed models could 
not be developed in as much as the unified and complete theory of linear models 
with fixed effects. Further it is up to the diversity of models. If we arrange @ in 
such an order that 07 = (67,0; ) is written with an unknown parameter vector 
@, and a random vector 03, then we can split up the matrix X and the vector f in 
(4.27) analogously. Then we find with X = (X, X2), B* = (Bi 583); the (” x p,)- 
matrix X,, the (” x p2)-matrix Xz and p; + po = p the following model variants: 


Y=X1f, +Xof. +e, (4.41) 
Y =X1P) + Xof, +e, (4.42) 
Y =X1f, + X2f +e. (4.43) 


All three models contain the linear model of Section 4.1 for pz = 0 as a special 
case. If X, 8, = “1, (py real), then each of the models (4.41) up to (4.43) is called 
model II. The other models with pz > 0 are called mixed models (in a 
stronger sense). 

The special models in Section 4.1.5 are usually denoted in the following way 
(after the model name the chapter number is given, where this model is treated, 
and the model specification is recorded; the models of covariance analysis are 
omitted here to guarantee a certain clarity): 


e Model I of regression analysis (8): (4.41) with rk(X,) = p, = p, po = 0. 
e Model II of regression analysis (8): (4.41) with X1f, = Poln (Po real), 
(Yi,Xi,p, +1)+»Xip) non-singular (p2+1)-dimensional distributed with p > 1. 
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e Mixed model of regression analysis (8): (441) with rk(X,) =p, >1, 
(Yn Xip, ihe non-singular (y2+1)-dimensional distributed with p» = 1. 
Regression model I with random regressors (8): (4.42) with #; = 0 (p; = 0), 
rk(X2) = p or with Xf, = Boen (Bo real), rk(X2) = p - 1. 

Model I of analysis of variance (5): (4.42) with X = 0 (ie. with p = pj), 
rk(X)) < p. 

Model II of analysis of variance (6): (4.42) with X1f, = w1y (uw real), rk(X2) 
<p-l. 

Mixed model of analysis of variance (7): (4.42) with p, > 1, rk(X1) < py, po = 1, 
rk(X2) < po. 


This list does not contain all possible models, but is focused on the ones 
described in the literature under the above given name. 

In the mixed models some problems arise, which are only briefly or even not 
treated in the preceding chapters. This concerns the estimation of variance 
components and the optimal prediction of random variables. The following 
problems occur in the mixed models (4.41) and (4.42): 


e Estimation of /; 
e Prediction of Xz and 2, respectively 
e Estimation of var(B2) 


The estimation of /, can principally done with methods described in 
Section 4.1 — but there are also methods of interest estimating /, and var(B2) 
together in an optimal way, based on a combined loss function. Prediction 
methods are briefly discussed in Section 4.2.1. Methods for estimating 
variance matrices var(B2) of special structure are dealt with in Section 4.2.2. 


4.2.1 Best Linear Unbiased Prediction (BLUP) 
We introduce a new concept, that of prediction. 
Definition 4.6 Model equation (4.42) is considered with E(e) = Oy. Further, 


let V = var(Y| B2) be positive definite, E(B2) = ba, var(B2) = B be positive definite, 
f, be known and cov(e,f,) = On,,,. A linear function in Y of the form 


L=a'(Y-X,f,) (a = (a1,...4N)', aj real) (4.44) 


is said to be an unbiased prediction or briefly ZL from the set of unbiased pre- 
dictions Dup if 

E|K-L]=0, (4.45) 
and it is said to be a best linear unbiased prediction (BLUP) of 
K=c'fy, c' = (C1.5€p)) if L is from Dyp and 
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var[K -L]= min var[K -L*] (4.46) 
L* €Dup 


is fulfilled for all V, bz, B and X,/. 

Analogously BLUPs can be defined for linear combinations of elements from 
X, in model equation (4.41). Here we restrict ourselves to the case of Definition 
4.6, since it is representative for all models. 


Theorem 4.12 The BLUP of c'f, = K (for unknown by) is given under the 
conditions of Definition 4.6 by 


L=a'(Y-X,f,) 
where 
a=VX,(X7V-1X2)*c, (4.47) 


provided that Dip has at least one element and X}V~1X, is positive defi- 
nite. Then 


var(K -L) =" (XIV-1Xp)'c. (4.48) 
Proof: First we show L € Dyp, that is, (4.45). Namely, we have 


E[K -L] =c'E(B.)-a' E(Y-X1f,) =c'by-a' Xoby 


=c™by—cT (XP V~1X) "XP V-UXpby = 0 


Now let L* = a*'(Y - Xf) be an arbitrary element from Dyj;p, that is, 
Xj}a=Xj}a* =c is fulfilled. Next we find 


var [c'B, -a*'(Y¥ -X;f,)| = var[c'B,-a°TY| 
= var[c'B,-a'Y+a'Y-a‘'Y]. 
Since 
var(Y) = E[var(Y)|B]+ var[E(Y|B,)] = V + X2BX> 
and analogously 
cov(Y,c"B,) =E[cov(Y,c"B,|B,)] + cov[E(Y|B,),c' By] = X2Be 
holds, it follows 


cov|c"B,-a'Y,a"Y¥ -a*'Y] = (a—a*)'X,Bc-(a-a*)' Va. 
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Because of a'X = a*'X> =c, the first summand on the right-hand side is equal 
to 0, and the second summand becomes with a in (4.47): 


(a-a*)" VV 1X, (XIV"1X2) "c= (a—a*)" Xp (XIV1X) "ec 
which is also equal to 0. This implies 

var |c™B,—a*'Y] =var[c™p,—a™Y] + (a-a*)'var(Y)(a—a’*) 
and consequently 

var|c™B,-a*'Y] >var|c™B,-a'Y| 


which completes the first part of the proof. 
The equation (4.48) follows by considering 


var(K -L) = var(K)+ var(L) -2 cov(K,L) 
=c'Bc+a' Va+a'X)BX3a-2c' BXj,a=a' Va 
and replacing a by its representation (4.47). 


Practical applications of this method are predictions of values concerning the 
regressand (predictand) in linear regression or predictions of random effects in 
mixed models of analysis of variance to determine the breeding values of sires, 
where Xf, is often unknown, cf. Rasch and Herrendérfer (1989). 


4.2.2 Estimation of Variance Components 


In models of type (4.42), the goal is often to estimate the variance var(f2) of Bo 
in the case rk(X1) < py, rk(X2) < po. If B = var(B2) is a diagonal matrix, then 
the diagonal elements are called variance components, and the factor o” in 
var(e) = oly is called variance component of the residual (of the error) and 
is to be estimated too. There are important causes to restrict ourselves to 
so-called quadratic estimators. 


Definition 4.7 Let Y be a random vector satisfying the model equation (4.42) 
and var(62) = B be a diagonal matrix with the diagonal elements oF (j=1,...,p2). 
Further, let 0? = 02 and cov(f2, e) = 0. The random variable Q = Y'AYis said to 
be a quadratic estimator with respect to a linear combination W = SY? 9cjo?. 
It is said to be a quadratic unbiased estimator with respect to W if E(Q) = W. 
Further, Q is said to be an invariant quadratic estimator if 


Q= y"AY = (Y—X,f,)'A(¥-Xif1) (4.49) 
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(ie. if AX, = 0). Finally, using the notations c’ = (co,¢1,...,Cp,) and C = diag(c) 
for the corresponding diagonal matrix, a quadratic estimator Q is said to be of 
minimal norm if the expression 
II|C- Xz AXal|| = III II| 

with the matrix A from Q becomes minimal in an arbitrary matrix norm ||| ||]. 
Usually the spectral norm is used (which is induced by the Euclidian vector 
norm). Rao (1970, 1971a, 1971b, 1971c) introduced for invariant unbiased esti- 
mators of minimal norm the name MINQUE (minimum norm quadratic unbi- 
ased estimator). 


There are a lot of papers about such estimators. Estimation methods for special 
models of analysis of variance can be found in Chapter 7. Following are some 
hints regarding the literature of general theory. 

In many cases with positive scalar W, we would hesitate to accept negative 
estimators (remembering that an estimator is defined as a mapping into the 
parameter space). 

But estimation principles as MINQUE, the method of analysis of variance 
described in Chapter 6 as well as the maximum likelihood method or a modified 
maximum likelihood estimation (REML: restricted maximum likelihood) have 
for normal distributed Y a positive probability that negative estimators occur; 
see Verdooren (1980, 1988). 

Pukelsheim (1981) discusses in a survey possibilities for guaranteeing 
non-negative unbiased estimators. Using the MINQUE principle, he states a 
sufficient condition for the existence of corresponding estimators; see also 
Verdooren (1988). 

Henderson (1953) published a first paper about methods for estimating 
variance components. Anderson et al. (1984) describe optimal estimations of 
variance components for arbitrary excess (kurtosis) of the distribution of e. 

The books of Sarhai and Ojeda (2004, 2005) deliver an inspired overview 
about the state of the art with respect to the special field of estimating variance 
components. 


4.3 Exercises 


4.1 Assume that C is a (m x p)-matrix whose columns form an orthonormal 
basis of the p-dimensional linear subspace Q of R”. Prove that the condi- 
tion C' b = 0, (b € R”) defines the (1 - p)-dimensional orthogonal com- 
plement of Q in R”. 


4.2 Prove that the solutions /* of the Gaussian normal equations X' Xf = X'Y 
supply a minimum of the squared (Euclidian) norm f (f) = ||Y — X AI’. 
Hint: Show that the second derivative of f (f) is a positive definite matrix. 
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4.3. Show that X = XGX"X is fulfilled, where G is a generalised (inner) inverse 
of XTX. 


4.4 Show that the relation X"(I,, - XGX") = 0 is satisfied if G is a generalised 
(inner) inverse of XX. 


4.5 Show that the matrix A in 


o o 

n n 
Ac? = 

me 2 

n n 


is idempotent and has the rank 1. 
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Analysis of Variance (ANOVA) - Fixed Effects Models 
(Model | of Analysis of Variance) 


5.1. Introduction 


An experimenter often has to find out in an experiment whether different values 
of one variable or of several variables have different results on the experimental 
material. The variables investigated in an experiment are called factors; their 
values are called factor levels. If the effects of several factors have to be exam- 
ined, the conventional method means to vary only one of these factors at once 
and to keep all other factors constant. To investigate the effect of p factors this 
way, p experiments have to be conducted. This approach is not only very labour 
intensive, but it can also be that the results at the levels of factor investigated 
depend on the constant levels of the remaining factors, which means that inter- 
actions between the factors exist. The British statistician R. A. Fisher recom- 
mended experimental designs by varying the levels of all factors at the same 
time. For the statistical analysis of the experimental results of such designs (they 
are called factorial experiments; see Chapter 12), Fisher developed a statistical 
procedure, the analysis of variance (ANOVA). The first publication about this 
topic stemmed from Fisher and Mackenzie (1923), a paper about the analysis of 
field trials in Fisher’s workplace at Rothamsted Experimental Station in Harpen- 
den (UK). A good overview is given in Scheffé (1959). 

The ANOVA is based on the decomposition of the sum of squared deviations 
of the observations from the total mean of the experiment into components. 
Each of the components is assigned to a specific factor or to the experimental 
error. Further a corresponding decomposition of the degrees of freedom 
belonging to sums of squared deviations is done. The ANOVA is mainly used 
to test statistical hypotheses (model I) or to estimate components of variance 
that can be assigned to the different factors (model II; see Chapter 6). 

The ANOVA can be applied on several problems based on mathematical 
models called model I, model II and mixed model, respectively. The problem 
leading to model I is as follows: all factor levels have been particularly 
selected and involved into the experiment because just these levels are of 
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practical interest. The objective of the experiment is to find out whether the 
effects of the different levels (or factor level combinations) differ significantly 
or randomly from each other. The experimental question can be answered 
by a statistical test if particular assumptions are fulfilled. The statistical conclu- 
sion refers to (finite) factor levels specifically selected. The problem leading to 
model II is as follows: the levels of the factors are a random sample from a uni- 
verse of possible levels. The objective of the experiment is to make a conclusion 
about the universe of all levels of a factor by estimating the proportion of the 
total variance that could be traced back to the variation of the factors or to test 
a hypothesis about these proportions of the total variance. 

The problems in model I are the estimation of the effects and interaction 
effects of the several factor levels and testing the significance of these effects. 
The problems in model II are the estimation of the components of variance of 
several factors or factor combinations and the hypotheses concerning these com- 
ponents. The estimation of components of variance is discussed in Chapter 6. 

In all chapters we also give hints concerning the design of experiments. 


Remarks about Program Packages 


In the analysis of the examples, we also give calculations without program 
packages although we assume that for the analysis of this data, the reader usually 
will use program packages like R, SPSS or SAS. We therefore give a short intro- 
duction about IBM SPSS Statistics and concerning sample size determination 
about the R-package OPDOE. IBM SPSS Statistics is very voluminous and with 
costs. The reader finds more information via www.ibm.com/marketplace/ 
cloud/statistical-analysis-and-reporting/us/en-us. 

With the program package R (free via CRAN: http://www.r-project.org or 
https://cran.r-project.org/), several analyses as well as experimental designs 
including sample size determination can be done. First one has to install 
R and then start. To experimental designs one then comes via the command 


install.packages (“OPDOE” ) 
and 
library (“OPDOE” ) 


Now one can calculate the sample size for analysis of variance (or for short 
Anova) via size .anova and find help by 


help (size.anova) . 


In SPSS for the one-way ANOVA, we use either 
Analyze 
Compare Means 
One-Way ANOVA 
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or by the path mainly for higher classifications with 
Analyze 
General Linear Models 
Univariate 


Definition 5.1 We start with a model 
Y=Xft+e, R[X]=Q (5.1) 


where Y is a N(Xf, 07Iy)-distributed N-dimensional random variable, e is a 
N(On, o°Iy)-distributed N-dimensional random variable, f is a [(a +1) x 1] vec- 
tor of parameters and Xa |[N x (a + 1)] matrix of rank p <a+1<N. Then (5.1) is 
the equation of model I of the ANOVA. 


If we abdicate the assumption of normal distribution in the parameter estima- 
tion, we receive BLUE instead of UVUE (see Chapter 2). That is the case in the 
sequel. If in point estimation normal distribution is given, then read UVUE in 
place of BLUE. In hypothesis testing and confidence estimation, normal distri- 
bution in Definition 5.1 is essential and will be assumed in those cases. 

We explain this definition by a simple example. 


Example 5.1 From a populations G),...,G,, random samples Yj,..., Y, of 
dimension (or as we also say of size) 1, ..., 4, have been drawn independently 


from each other. We write Yj = (¥j1).--in,) ” They, are distributed in the popu- 


lations G; as N({u;},07,,) with {y;}=(y;...M;)'. Further we write 
H;="+a;(i=1,...,k). Then we have 


Jy =H+ aj + eg(i=1,..nks f= 1.1). (5.2) 

Be T T T T : 
Writing 6 = (",a1,...,a%) and Y’=(Y7,...,.¥7), then Y is a (N x 1) vector by 
putting N= Ye Mi Now we can write (5.2) in the form (5.1) if 


T 
OH (C11 5661, 910g 1y++-1 Can, ) aS well as 


Tbs Herre (a is Geen ON Las See | 
Pt SOOO: 2.2 00 ae 
Sore OO se OD Sooty 3 YONI 0 


and X = (1, & In); respectively. 
i=1 
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In Example 4.4, we have shown that in general no unique MSE of # exists 
because the normal equations have infinitely many solutions. Let 6" be any solu- 
tion of the normal equations 


XT Xp =XTY. 
Let G=(X7X) bea generalised inverse of X'X. Then we have 
BY =GXTY. (5.3) 


If we choose a [(a + 1-p) x (a+ 1)] matrix B of rank a+ 1 -p so that 


xX 
a( )-an 
B 


BB=0 (5.4) 


and 


then by the side condition (5.4) the generalised inverse G of XX is uniquely 
determined and equal to G = (X7X + B™B) ee By this also 6" is uniquely deter- 
mined (i.e. # in (5.1) is uniquely defined) and equal to the MSE (MLE): 
B= (X7X+B"B)'XTY. (5.5) 
This leads to 


Theorem 5.1 If B in (5.4) is a matrix, whose rank space R[B] is orthogonal to 
xX 
the rank space R[X] of the matrix X in (5.1) and if rk(H) =rk (;) =a+land 


the side condition (5.4) is fulfilled, then f in (5.1) is estimable by (5.5). 


Proof: We minimise r = ||Y —XA||° +47 Bp with a7 = (A1,...4a+1-p) by putting 
the first derivatives of r with respect to # and 4 equal to zero. With the notation 
P = B*, we obtain 


2X7 Xp*-2X7Y +B"A=0, 
BB" =0. 
Because r is convex we really obtain a minimum in this way. For each 


OE RX] =2 is P uniquely defined by Clee = Hf, which means that 


1 


for each 6€Q we have (97,07...) € R[H]. Because H(H7H) H7™ is the 


matrix of the orthogonal projection from R‘*“*+!-? on R[H] (see Example 
4.3), we obtain 
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0 6 
meray ; }-(, ; 
at+l-p a+l1l-p 


X(H™H) 'X70=0, B(H™H) X70 =0441- 
Pp 


or 


for all 9€.Q. Therefore R [x(H7H) 'B"| LR[X] and B(H™H)"'XT =0. From 


the equations above it follows that X(H7H)~ "XT is idempotent and by this 
the matrix of the orthogonal projection of R’ into a linear vector space V enclos- 
ing Q. 

On the other hand, V = BX(H7H) ‘XT CQ so that V = follows. Multiply- 
ing 2X7Xf)-2X7Y +B™A=0 from the left by B(H7H) ', we immediately 
obtain (because) B(H7H) ‘BTA =0. 

Now Bhas full rank and H’ His positive definite, so that B(H7H) ‘BT is non- 
singular and A =0 follows. From the normal equations we therefore obtain 

X7O=X'Xp =XTY. 


Multiplying both sides with X(H7H) ', we see that (H7H) ' is a generalised 
inverse of X/X. From (5.3) then follows Equation (5.5) 
because H'H =X'X+B"B. 


Example 5.2 In Example 5.1 let a = 2 and initially n = 1, =n. Then we get 


1.1 ...1 
XT-ef1..10.. 0], 
O..01..1 


a matrix with 2” columns. Without loss of generality we write in (5.4) 
B= (0,11), and by this (5.4) has the form Sai =0. Writing N =2n, it follows 


Nunn 000 
X™X=|n nO], B’B=|01 11], 
non 011 
and 
2n on n 


X™X+B'B=| n n+1 1 


n 1 n+l 
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is a matrix of rank three. The inverse of this matrix is 


n+2 -n -n 
(xTx+BTB) 12+ | —n n+2 n-2 
4n 
-n n-2 n+2 
Y*.. 
Using (5.5) and X7Y=| Yj. }, we finally receive 
Yo 
: y Bh 
B=|N - | =| a 
eee 5) 


In the case 1, 4M, we have with N =n; + 1: 
N ny no 
X’X=| nm, m 0 
Ny O Ny 


For this case in the literature two methods for choosing B can be found. On the 
one hand, analogously to the case with n =, =, one can choose 


B,= (0, 1, 1) 
and on the other hand 
By =(0, m4, nN). 


In the first case again 


Sai =0. 


In contrast with the second case, where it follows 


S oniai = 0. 


In the second case it is implied that the a; effects of factor levels have the prop- 
erty that after multiplying with the sample sizes n; and summing up gives 0. 
Especially in designs with several factors or if 1; are random (as in animal experi- 
ments), such an assumption is not plausible. 


In the first case (B,), we have 


NyN2+N = nNy-nNy-N\Ny Ny—-—Ny-N\Ny 
1 


4n\ Nn» 


(X7X+BTB,)~ Ny-My—-NyNy yn» +N N\Ny-N 


Ny -Ny-N\Ny, ~~ N\nNy»-N N\Ny,+N 
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and the estimator of / becomes 


(V1. + Fr) 
fy (1) D3 
B 
a i. 
Bi = a) = 3" Jo) 
~ (1) 
a2 Di 
5 Verda.) 


In the second case (B>), we have 


Ny\N2(1+N) —nNyNnNy» —Nj\N2 
Fee eS: ) —n\n ny (n2+n,+n\n nyny(1-N ; 
( + B; By) Mens 1/2 2 (5 +11 + 1\N2) 12 ( ) 
nN nyn2(1-N) ny (nj + m2 + M1N2) 

and the estimator of / is 

jp Wo 

by=| am |=|% - 7. 
ay”) Jo - Y- 


The reader may ask which form of B he should use. There is no general 
answer. While the two forms B, and By are arbitrary, many others are possible. 
In the ambiguity of B, the ambiguity of the generalised inverse (X7X)_ is 
reflected. Therefore estimates of a; are less interesting than those for y+ ai, 
which are the same for all possible B. 

As shown in Chapter 4, the tests of testable hypotheses of the a; and the esti- 
mates of estimable functions of the a; do also not depend of the special selected 
Bor (X7X) . 

Because the tests of testable hypotheses and the estimation of estimable func- 
tions of the effects of factor levels play an important role in model I, the ambi- 
guity of (X7X)~ does not influence the final solution. We therefore solve the 
normal equations under side conditions, resulting in a simple solution. 

We now summarise the definitions of estimable functions and testable 
hypotheses for model I of the ANOVA together with some important theorems 
and conclusions. 

Following Definition 4.4 a linear function q’f of the parameter vector / in 
(5.1) is estimable, if it equals at least one linear function t’E (Y) of the expecta- 
tion vector of the random variable Y in (5.1). 

Then it follows from Theorem 4.11 for the model equation (5.1): 


a) The linear functions of E(Y) are estimable. 
b) If qi B are estimable functions (j= 1,...,a), then also 
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i S oq) B (c real) 
j=l 


is an estimable function. 

c) q' Bis an estimable function if g7 can be written in the form t’ X with X from 
(5.1) (f€ R"). 

d) The BLUE of an estimable function q’/ is 


q B=q' pat"X(X™X) XTY 
with B from (5.3); it is independent of the choice of f and by this independent of 
the choice of (X7X) . 


e) The covariance between the BLUE qiB and the BLUE qb of two estimable 
functions of g7f and qj B is given by 


cov (a7, qb) =q) (X7X) qo. (5.6) 


As we have seen it is indifferent for the estimation of estimable functions which 
generalised inverse (X7X) in (5.3) is chosen. Even the variance of the estimator 
does not depend on the choice of (X7X) because cov(«,x) = var(x). 

The concept of an estimable function is closely connected with that of a test- 
able hypothesis. 

A hypothesis H:K7f$=a* with # from (5.1) is called testable if with 
K= (okey and K'£ = {kB} (i=1,....g) the k7f are for all i estimable 
functions. 

Finally we give some results for generalised inverses in form of lemmas. As 
used already before, each matrix A~ for which 


AA A=A 
is called a generalised inverse of the matrix A. 


Lemma 5.1 If(X7X)_ isa generalised inverse of the symmetrical matrix XX, 
we get 


NOX) DOR aX,, DORR X) 
Lemma 5.2 For a system of simultaneous linear equations X7 Xx = X7 y (nor- 


mal equations), all solution vectors x have the form 


x= (XTX) X7y. 
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Lemma 5.3 If M is a symmetrical matrix of the form 


AB 
M= ; 
B' D 


then with Q = D-B’A~B the matrix 


A +A°~BQ°BTA~ -A~BQ- A~ 0 -A-B 
ee eal ee 
-Q BA Q 0 0 I 


is a generalised inverse of M, with the identity matrix J. 


5.2 Analysis of Variance with One Factor (Simple- or 
One-Way Analysis of Variance) 


In this section we investigate the situation that in an experiment several ‘treat- 
ments’ or levels of a factor A have to be compared with each other. The corre- 
sponding analysis is often called ‘simple ANOVA’. 


5.2.1 The Model and the Analysis 


We start with a model equation of the form (5.2) and call the total mean and a; 
the effect of the ith level of factor A. In Table 5.1 we find the scheme of the 
observations of an experiment with a levels Aj, ...,A, of factor A and n; obser- 
vations for the ith level A; of A. 


Table 5.1 Observations y, of an experiment with a levels of a factor. 


Number of the levels of the factor 


1 2 ase i es a 
Vir J21 on Via te Yai 
Ji2 J22 one Ji2 te Ya2 
Jin, Jan, on Jin; Vang 
Nj ny No ot Nn; ae Ng 
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If an experiment is designed to draw conclusions about the levels A; occurring 
in the experiment, then model | is appropriate and we use the introduced math- 
ematical model I in Definition 5.1 as the basis for the design and analysis. If how- 
ever A; are randomly selected from a universe of levels, then model II as 


described in Chapter 6 is used. 


We use Equation (5.2) with the side conditions 


E(e;) =0, cov (ej, ex) = 5x51. 


For testing hypothesis the e; and by this also the y; are assumed to be nor- 


mally distributed. Then it follows from the examples above. 


Theorem 5.2 Solutions 4; for the a; (i=1,...,4) and fi for w of the normal 


equation (5.5) for model equation (5.2) are given by 


in the case (5.4) for the matrix B= (0,1...., 
In the case (5.4) for the matrix B = (0,7,...,42), they are given by 


ft =I 


1). 


(5.7) 


(5.8) 


(5.9) 
(5.10) 


Both estimations are identical if n;=n(i=1,...,a). The variance o* in both 


cases is unbiasedly estimated by 


2 Dey 
~ N-a : 


The proof of the first part of that theorem follows from (5.5). For 


B= (0.1,...41) is 


X™X+B B=| m2 1 mtl1-::: 
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For B= (0,71,...,Na)s 


N ny nN» tee Ny 
2 
my Ay+1 nyno +++ MM, 
2 
X™X+B'B= M2 NN, Ny+N2 ++: Nong 
2 
Ng May MNgNy ++: WL+Ng 


Simply we can obtain (5.7) and (5.8) also by minimising 
a nj , . 2 
> >, Wy-A-a) 
i=1 j=l 
under the side condition at =0. The solutions (5.9) and (5.10) can be 
obtained by minimising 
a nj i ‘ 2 
> >, y-A-a) 
i=l j=l 
under the side condition Soy Midi = 0. It follows 


a 


E(,A)= FO.) = ; (u+ai) = 


i=1 


because of a a= 0 and 


7 a-1 1 a-1 
E(,@;) = —(u+ai)-2) | (e+ 4) = ait ai = i 
i 


because of Soa = 0004 = -ai). 


Analogously the unbiasedness of (5.9) and (5.10) under the corresponding 
side conditions can be shown. 
The second part of the theorems is a special case of Theorem 4.4. 


Estimable functions of the model parameters are, for instance, 
Hta;(i=1,...,a) or aj—-aj(ij=1,...,4;i Aj) with the estimators 


H+ OG =Y¥;, = 1flt 10; = oft + 24; 
and 


es 


Oj — Oj = Vj — Vj, = 1; — 10; = 28; — Xp 


respectively. They are independent from the special choice of B and of (X7X) . 
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One example of an experiment that can be modelled in a model 1 is to test the 
null hypothesis Ho: a; =a; for all i#j against the alternative that at least two 
a; differ from each other. This null hypothesis corresponds with the assumption 
that the effects of a factor considered for all a levels are equal. The basis of the cor- 
responding tests is the fact that the sum of squared deviations SS of the y, from the 
total mean of the experiment y.. can be broken down to independent components. 
The following trivial conclusion is formulated as a theorem due to its importance. 


Theorem 5.3 Let us draw samples from a populations P; and let y, be the jth 
observations of the sample from the i-ten population and y,. the mean of this 
sample. Let N be the total number of observations and y.. the total mean of 
the experiment. The sum of squared deviations of the observations from the 
total mean of the experiment 


SS7 = sy (-9-) =YTY-Ny with Y" = (yy-59en,) 
i=1j=1 


can be written in the form 
Y"Y-Ny”=Y" [Iy-X(X7X) X™|V¥+Y™X(X'X) XTY-N¥. 
or as 
2 a nN; 2 a nN; 
= = See | a ND 
(¥,-3-) = (5,,-3.-] + (9,-7.4). 
i=1 j=l i=1 j=l i=l j=l 
The left-hand side is called SS total or for short SS; the first component of the 
right-hand side is called SS within the treatments or levels of factor A (short SS 
within or SS,,=SS,,;) and SS between the treatments or levels of factor A 
(SS, = SS4), respectively. 
We generally write 


S8r= 94 2 
SS yes = SS %- 
ij 


ems! 
SS4= 5° a re 


2 
ne 
, 


i 


i 
Theorem 5.4 Under the assumptions of Definition 5.1, 


pee (5.11) 
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is distributed as F(a-1,N-a,A) with the non-centrality parameter 
1 = 1 
A= BX" | X(X™X) X7-—In )X 
SPTX? (X(KTK) XT Sr XB 


and Iy,w=1h, = 1,1}. 
If Ho : a, =-++ = dg, then F because of A=0 is F(a—1,N —a)-distributed. 


T 
Proof: Y = (Ge Pingtepee an) is N(Xf; 07Iy)-distributed. Because of 


Theorem 5.3 Y'Y is the sum of three quadratic forms taking 


1 
Y= ny ly, nY into account, namely, 


YTY=Y7A\Y+Y'AY+Y'A3Y 
with 
A, =Iy-X(X™X) X7, Ap=X(X7X) XT- SU ey, 
N N 
From Lemma 5.1 X(X7X) X7 is idempotent of rank a and by this A, is idem- 
potent of rank N—a. Further Ag is idempotent of rank 1. Because 14, is the first 
row of X“, it follows from Lemma 5.1 12.X(X7X) X7 =1% and from this the 


idempotence of Aj. The rank of A, is a—1. By this, for instance, condition 1 
of Theorem 4.6 (N =n, 1, =N-a, ny=a-1, n3=1) is fulfilled. Therefore 


1 1 
=ai TAY is CS (N -a,4;)-distributed and SYTAY is independent of 
o oO 
1 
=¥ TAY distributed as CS(a—1,2) with 
oO 
1 oryr 
A, = =P xX AiXP =0 
oO 
and 
ate pixt XOPR May XB 
o N* : 
This completes the proof. 


Following Theorem 5.4 the hypothesis Ho : a; =--- =a, can be tested by an 


SS 
F-test. The ratios MS, = MS, = oe and MSp = MS,., = MS, = 


mean squares between a: and within treatments or residual mean 
squares, respectively. The expectations of these MS are 


2 
z 1 
ee) 
E(MS,4) = +h [Spee nia; (Som) |-- +77 55A 


Sy are called 
-a 
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and 
E(MS 5) = 0°. 


Under the reparametrisation condition So nia = 0, we receive 
1 a 
E(MS,) = 0? + —— YS n,a?. 
( ‘A) a-1 ey i? ; 


Now the several steps in the simple ANOVA for model I can be summarised 
as follows. 

We assumed that from systematically selected normally distributed populations 
with expectations y+ a; and the same variance o”, representing the levels of a 
factor — also called treatments — independent random samples of size n; have 
been drawn. For the N observations y,, we assume model equation (5.2) with its 
side conditions. From the observations in Table 5.1, the column sums Y;. and 
the number of observations are initially calculated. The corresponding means 


are UVUE under the assumed normal distribution and for arbitrary distribu- 
tions with finite second moments BLUE of the p+ qj. 
To test the null hypothesis a; =--- =a, that all treatments effects are equal 


and by this all samples stem from the same population, we need the sums 
2 


y? Y 
2; i. a6 
; pi ) a and further ce 


With these sums, a so-called theoretical ANOVA table can be constructed as 
shown in Table 5.2. In such an ANOVA table occur so-called sources of 


Table 5.2 Theoretical analysis of variance table of the one-way analysis of variance model 


l (Soa=0). 


Source of 
variation SS df MSs E(MS) F 
ie as, Sn ask Ppa tSe 
= oh ax —— a Fr= 
Main SSa=)0 i We a-l MS, = ——* ota) ma Fa MSres 
effect A i 


Y? SS... 6 
idual SS res = 2 — = res = mn 
Residual Soy; Bir N-a MS Nea 


y2 
Total = SSr=) j- Gt OCN-1 


MS, mean squares; SS, sum of squares. 
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variation (between the treatments or levels of factor A, residual (or within the 
levels) and total). The SS is in the second column, the degrees of freedom (df) in 
the third, the MS in the fourth, the E(MS) in the fifth and the F-statistic in 
the sixth. 

In a practical ANOVA table with data (computer output), the column E(MS) 
does not occur, and no random variables but only their realisations appear. 

The following functions of the parameters in / are estimable: + a; and y+ ai, 
i=1, ..., a are BLUEs. Further y ci(# +i) is estimable by the BLUE 


ae oii. (Under normality assumptions they are UVUEs.) 
Further all linear contrasts oa iti mit Soci = 0) as, for instance, differ- 


ences a; - a; (i #j) between the components of a (c; = 1, cj = —1) or terms of the 


form 2a; - 4; -4,(c =2, cs = -1, ¢ = -1,) 4s 41) are estimable. The advantage 
of estimable functions is their independence of the special choice of (X7X)~ 
and that a hypothesis Hy: K7/ = a* with the test statistic given in (4.34) is test- 
able if K’f is estimable. 


Because the hypothesis a) =--- =a, can be written in the form K AB =0 with 
B= (M@1,...,0q)' and the [(a-1) x (a+ 1)] matrix 
01-1 0... O 
01 0-1... O 
Kes oe ‘ ‘ : = (04-1, la-1, -Ly-1), 
01 0 0... -1 


it is testable. The test statistic as introduced in Theorem 5.4 is along with the 
given K", a special case of the test statistic F in (4.34). 

Introducing side conditions can change the conclusions about estimability 
and the BLUE. For example, under the condition ea Miai = 0, the parameter 
is estimable; the BLUE is y__. This also means that the hypothesis Hp: p =0 is 
testable. 

Also under the side condition a ai =9, the parameter y is estimable; but 


: lye _ 
the BLUE is now pee 
For the ambiguousness of (X7X) and the choice of particular side 


conditions, we make some general remarks, which are also valid for other 
classifications in the following sections but will not be repeated: 


e Independent of the special choice of (X7X)~ and by this of the choice of the 
side conditions are: 
— The SS, MS and F-values in the ANOVA tables of testable hypotheses 
— The estimators of estimable functions 
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e In practical applications we do not need estimates of non-estimable functions. 
If, for example, three animal feed for pigs have to be analysed and the model 
Yj= H+a;t+ej is used, the evaluation of these feed can be done by 
f+d),+ ad, and «+43 and the parameters a), dz and a3 are not needed. 

e Ifa problem is independent of the special choice of (X7X)_, it is often favour- 
able for the derivation of formulae to do this under special side conditions. 
Normal equations under side conditions can often be relatively simple. 


We demonstrate this by an example. 


Example 5.3 In an insemination centre, three sires B,, Bj, B3. are available. 
By help of milk yields yj(i=1,2,3;7=1,...,2;) of nj daughters of these sires, it 
shall be examined whether differences in the breeding value of these sires con- 
cerning the milk fat exist. We assume that the observations y; are realisations of 
N( + 4;,07)-distributed and independent random variables following model 
(5.2). Table 5.3 contains the performances y, of the daughters of the three sires. 
We can ask the following: 


e What is the breeding value of the sires? 

e Is the null hypothesis Ho : a) = a2 = a3 valid? 

e What are the estimates of a,-—d and — 8d, -6a7 + 1443? 

e Can we accept the null hypothesis Ho: a) —d2 =0, — 8a) —6a2 + 1443 = 0? 


All tests should be done with a first kind risk of a = 0.05. 
It follows from (5.1) and (5.2), respectively, 


Vu = 120="+a, + €11), 


Ng =155 =p" +a) + e102, 


Table 5.3 Performances (milk fat in kg) y, of 
the daughters of three sires. 


Sire 
B, Bz Bz 

Vij 120 153 130 

155 144 138 

131 147 122 

130 
Nj 4 3 3 
Y; 536 444 390 
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Y13 = 131 =HU+a) + ej3, 


14 = 130 =p +4) + e14, 


yo, = 153 =p + a2 + €21, 


Y22 = 144=y + 49 + €22; 


Y23 = 147 =p + a2 + €23, 


431 = 130 =p + a3 + €31, 


430 = 138 =p + a3 + €39, 


Y33 = 122=p + 3 + €33, 


and by this it is in (5.1) 


Y = (120,155,131,130,153,144,147,130,138,122) i 


B=(U,41,42,43)", @= (€11).+5€33) 
iG 
110 


1 


Boe Be ee Pe ep 
> ee ee ee ee 


1 


0 


(<> ee > eo =) 
ePrPFroeoooooo 


0 


= (110, 14@13013), 


a =3,n, =4,N,=n3=3 and N =10. 


All hypotheses are testable; a;-a2. and -—8a,-6a,+14a3 are estimable 


functions. 


It is sufficient to calculate any generalised inverse of XX. In this example, in 
solution | once more, a generalised inverse of X‘X is calculated; solution 2 shows 


the approach by using the formulae derived in this section. In the examples of 


the following sections, only the simple formulae of the SS are used. 


Solution 1: 


To calculate (X"X)~ an algorithm exploiting the symmetry of X7X is used: 


e Determine rk(X7X) =r. 

e Select a non-singular (r x r) submatrix of rank r and invert it. 

e Replace each element of the submatrix of X'X by the element of the inverse 
and the other elements of X7X by zeros. 
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We initially calculate 


10 43 3 
4 400 
XTX = 
3.030 
3.003 
The sum of the last three rows is equal to the first one. Because the submatrix 
400 400 
3 0 | has rank 3, we get rk(X7X) =r =3. The inverse of | 0 3 0 | equals 
003 003 
1 
a 0 0 
0 : 0 |, and therefore we obtain 
C0: 
0000 
1 
: 0 ri 0 0 
xXx°X 
Coe teal Preece 
3 
1 
000 5 


As a check we can show that (X7X)(X7X) X7X=X?TX. 
To calculate f first we find 


[XY] ” = (Yun Yi Yo.s Ys.) = (1370,536,444,390)", 


and then we obtain 


0 0 a 

: - y 134 a 

ro ee awe eo ea iia = iG 
My 148 fin 
Vs 130 ais 


The breeding value of the sires is estimated by y,. The estimable functions 
pita; are estimated by 134, 148 and 130, respectively. 

To test the null hypothesis Ho : a; = a2 = 43, we calculate the test statistic 
(4.34), namely, 


(K™B-a)'[KT(X™x) K]"' (KTB-a) on 
¥" (i -X(XEXY KY q 


P 


, 
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where Ho:4,=42=43 is written in the form K’f=a* with a*=0 and 


01-1 0 
KP =(' a :) and we have p=4a, qg=a-1. 


The realisation F of F is 


PP Be 
eS 12 12 eS 
3.7 
a -— a3 See ee a — a3 
re 12 12 e 
Vee XOX) ROY 3, 


12/7 -3 a 
The inverse is iG ( ae ) and the numerator (because of a;-4j = 9; —J;; 


J.-J, = - 14,9, -9 5 =4) finally becomes 


12/7 -3 -14 
-4a2( 7, 2) (28) 546 
We further have 


4 @zl @rl 
gq M3 1330 3 h33- 


In the denominator it is Y7 IY = y i = 189068, 


XOX) x2 = 


1 if 1 es 
ye lu 1n05I%0| Y=) = = 188236, 
i=1 U 


and 
a cae = 2.297. 
832 2 
The quantile of the F-distribution in Table A.5 for a=0.05 with 2 and 7 
degrees of freedom is 4.74, and therefore the null hypothesis Ho : a) = dz = 43 
is not rejected. The estimate of a; — dp is, as already mentioned, y, -y, = — 14. 


By (5.6) we can calculate var (a - aa) . Because a — 2 has the form q7f with 


q' =(0,1,-1,0), it follows from (5.6): 


000 
; 0 
0:7 0-0 1 
var (a1 - a2) = (0,1, -1,0) 1 o 
0 . 0 =] 
1 0 
0003 
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The function — 8a, —6a2 + 1443 is a linear contrast and estimable. Following 
Theorem 4.10 and because of — 8a) —6a2 + 14a3 = (0, —-8, -6, 14)f, the BLUE of 
this linear contrast is 


(0, -8, -6,14)f = -8y, -6y, + 147, = - 140. 


Taking 


tll 
(0, -8, -6, 14) (X7X) =0 


0 


into account the two contrasts are orthogonal. From (5.6) we obtain the vari- 
93 
ance of the estimated contrasts as ae =31o’. 


The null hypothesis 


Pn 0 1 -1 0 p=0 
POO 28 3 aa) 
is tested by the test statistic of Theorem 5.4 with 


0 1 -1 0 : 
x-( and G=(X7X) . 


0 -8 -6 14 
It follows 
f a0 480 0 
T = 12 7 a eee 7 =, 1 
K'GK = 280 and (K'GK) = 3 “5 . S) 
3 280 


The SS in the numerator of F is then 


1 (480 0\ ( -14 
(214.4140) = 336 + 210 = 546. 
280 \ 0 3/ \ -140 


The realisation F of F in this case is again 


46 7 
= oe -~ = 2.297. 
832 2 
However in contrast to the null hypothesis written with non-orthogonal con- 
trasts the sub-hypotheses 


Ho [41 =), Ho: — 8d) -—6a2 + 1443 =0 
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with the numerators SS, 336 and 210, respectively (with one degree of freedom 
each) can be tested separately so that the test of one hypothesis is independent 
of the validity of the other. For Ho : 4; = 2 the test statistic is 


F= Gees = 2.827 
832 
and for the hypothesis Ho : — 841 —6a2 + 14a3 =0, the test statistic is 
210 
F = —.7=1.767. 
832 


The two sub-hypotheses are both accepted. 


Solution 2: 

This solution is the usual one for practical calculations. Initially the values in 
Tables 5.3 and 5.4as well as Y* = 1876900 and 5 Y? = 187690 are calculated. 
The y, are estimates of +a; (i=1,2,3). To test the null hypothesis 
Ho: a, = dz =a3, we need an ANOVA table such as Table 5.2 without E(MS) 
(Table 5.5). The values of this table can be obtained from Table 5.4 (e.g. 188 
236 — 187 690 = 546). The decomposition of the SS between sires in additive 
components concerning the orthogonal contrasts is shown in Table 5.6. 


Table 5.4 Results in the analysis of variance of the material in Table 5.3. 


ve 


Sire Yj. ¥? ii Sy 

By 536 287 296 71 824 72 486 
By 444 197 136 65 712 65 754 
Bs; 390 152 100 50 700 50 828 
Sum 1370 188 236 189 068 


Table 5.5 Analysis of variance table for testing the hypothesis a; =a2 =a3 of Example 5.3. 


Source of variation SS df MSs F 
Between sires 546 2 273.00 2.297 
Within sires 832 7 118.86 


Total 1378 9 
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Table 5.6 Table for testing the hypotheses a; =a2 and —8a;—6a2 + 1443 =0. 


Source of variation SS df Ms F 

a -a 336 1 336.00 2.827 
— 8a) -6d + 1443 210 1 210.00 1.767 
Between sires 546 2 273.00 2.297 
Within sires 832 7 118.86 

Total 1378 9 


Remarks about Program Packages 


With statistical program packages like R, SAS or SPSS, calculations can be done 
safely and simply. In R we use the command Im(). 

We demonstrate the analysis of Example 5.3 with IBM SPSS 24 (SPSS for 
short). Initially the data must be brought into a data matrix. After starting SPSS 
we use the option ‘Data input’ and define the variable ‘Sire’ and ‘fat’. By this we 
define two columns of the data matrix. In the second column, we insert the 
number of the sire to which the daughter performance belongs, in our case four 
times 1, three times 2 and three times 3. In the first column, the corresponding 
10 daughter performances (fat) are listed. In Figure 5.1 we find sire as factor and 
the data matrix. We now proceed with 

Analyze 

Compare Means 
— One-Way ANOVA 
and define ‘sire’ as factor and ‘fat’ as dependent variable. By clicking OK we get 
the result in Figure 5.2. 


5.2.2 Planning the Size of an Experiment 
For planning the size of an experiment, precision requirements are needed as in 


Chapter 3. The following approach is valid for all sections of this chapter. 


5.2.2.1 General Description for All Sections of This Chapter 
At first we repeat the density function of the non-central F-distribution. It reads 
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EA “Untitled? [DataSet2] - IBM SPSS Statistics Data Editor - tia) x 


File Edit View Data f[ransform Analyze Graphs Utilities Extensions Window Help 


Serer | Pee Seer 


Visible: 2 of 2 Variables 

Piat_| Bsive | var | var | var |v a va | var | 
120 
155 
131 
130 
153 
144 
147 
130 
138 

| 122 


1 
2 
3 
4 
5 
6 
7 
8 
9 
10 
ak 


wwwnnn aa a 


Figure 5.1 Data file of Example 5.3. Source: Reproduced with permission of IBM. 


tea “Output! [Document!] - IBM SPSS Statistics Viewer - ag *x 
file Eoit View Data Transform Inset Format analyze Graphs Uiliies Extensions Window Help 


Snes AR ea te OO SEZs 
9% += Bu She 


& © output Std. Error 
(Log Mean Std. Deviation Mean 


& & Descniptves 13 «9769 1.9868 5610 
ae abe 1314531 2.9519 8187 


| +1 Group Statistics 
&- © oneway 


‘Sum of 
+) Tite Squares dt Mean Square 


[Notes = 
+ ANOVA) Between Groups 546.000 273.000 


Within Groups 832,000 118.857 
Total 1378.000 


[Double cick to edit PivotTable [IBM SPSS Statistics Processor is ready, | _Unicode:ON H: 4.31, W: 13.57cm 


Figure 5.2 ANOVA output for Example 5.3. Source: Reproduced with permission of IBM. 
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Analogously to the relation 
t(n-1|1-a@) =t(n-1,A|p) 


in Chapter 3 for the quantile of the central and non-central F-distribution, 
respectively, we now use the relation 


F(fi, f,0|1-a) = F(A, f,alp), (5.12) 


where f; and f2 are the degrees of freedom of the numerator and the denomina- 
tor, respectively, of the test statistic. Further a@ and f are the two risks and / is the 
non-centrality parameter. This equation plays an important role in all other sec- 
tions of this chapter. Besides fi, fo, a and f, the difference 6 between the largest 
and the smallest effect (main effect or in the following sections also interaction 
effect), to be tested against null, belongs to the precision requirement. We 
denote the solution 4 in (5.12) by 


= Mab Ai, 2). 
Let Emin» Emax be the minimum and the maximum of q effects E), Fo, ..., E, ofa 
fixed factor E or an interaction. Usually we standardise the precision require- 
re) 
ment by the relative precision requirement 7 = —. 
oO 


If Emax -Emin 2 6, then for the non-centrality parameter of the F-distribution 
(for even g) with E = LSE: holds 


(Pe EY 4 (Ei < Ey 
oD, 2 3 
If we omit 5 , then it follows 
oO 
q 
A= x (E;-E)” /o? > q6/ (20°). (5.13) 


= 


The minimal size of the experiment needed depends on 4 accordingly to the 
exact position of all q effects. But this is not known when the experiment starts. 
We consider two extreme cases, the most favourable (resulting in the smallest 
minimal size 1,,;,) and the least favourable (resulting in the largest minimal size 
Nmax)- The least favourable case leads to the smallest non-centrality parameter 
Amin and by this to the so-called maximin size 19x. This occurs if the q-2 
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Enax Emin 7 7m + 
non-extreme effects equal a For E=0, Sot (Ei-EY =qE’ this is 


shown in the following scheme: 


E,=-E O=E,=---=Ey_ 4 Eg=E 

The most favourable case leads to the largest non-centrality parameter A max 
and by this to the so-called minimin size nin. For even g = 2 m this is the case, if 
m of the E; equal E,,;, and the m other E; equal E,,4x. For odd g=2m+ lagainm 
of the E; should equal E,,i, and m other E; should equal E,,x, and the remaining 
effect should be equal to one of the two extremes Eni, or Emax. For 


£=0, ys: i ~E) = qE”, this is shown in the following scheme for even 4: 


5.2.2.2. The Experimental Size for the One-Way Classification 

We now determine the required experimental size for the most favourable as 
well as for the least favourable case, that is, we are looking for the smallest n 
(for instance, 1 = 2q) so that for Amax =A and for Amin = —A, respectively, (5.13) 
is fulfilled. 

The experimenter must select a size 7 in the interval Mmin <7 < Mmax, but if he 
wants to be on the safe side, he must choose 1=Nmax. The solution of the 
Equation (5.12) is laborious and done mostly by computer programs. The pro- 
gram OPDOE of R allows the determination of the minimal size for the most 
favourable and the least favourable case in dependence on a, f, 6 and 7 and 
the number a of treatments (levels of factor A) for all cases in this chapter. 
The corresponding algorithm stems from Lenth (1986) and Rasch et al. 
(1997). We demonstrate both programs by an example. In any case one can 
show that the minimal experimental size is smallest if 1, =n, =---=ng=n, 
which can be reached by planning the experiment. The design function of 
the R-package OPDOE for the ANOVA is called size.anova() and for the 
one-way ANOVA has the form 


>size.anova(model="a", a=,alpha=,beta=,delta=,case=). 


It calculates the minimal size for any of the a levels of factor A for model I in 
model = "a" and the number of levels a=. Besides the risks, the relative minimal 
difference t = 5/o (delta) and the strategy of optimisation (case: “maximin” or 
“minimin”) must be put in. 

We demonstrate all programs by an example. 
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Example 5.4 Determine myn and Max for a = 4, a = 0.05, 8 = 0.1 andt = 6/o = 2. 


With OPDOE of R we get 


> size.anova(model="a", a=4, alpha=0.05, beta=0.1, 
+delta=2, case="minimin") 

n 

5 

>size.anova(model="a", a=4, alpha=0.05, beta=0.1, 
+delta=2, case="maximin") 

n 

9 


Now a value of 1 between 5 and 9 must be used. 


5.3. Two-Way Analysis of Variance 


The two-way ANOVA isa procedure for experiments to investigate the effects of 
two factors. Let us investigate a varieties of wheat and b fertilisers in their effect 
on the yield (per ha). The a varieties as well as the b fertilisers are assumed to be 
fixed (selected systematically) as always in this chapter with fixed effects. Then 
factor variety is factor A, and factor fertiliser is factor B. In this and the next chap- 
ter, the number of levels of factor X is denoted by the same letter x as factor 
(a capital letter) but as a small letter. So factor A has a, and factor B has b levels 
in the experiment. In experiments with two factors, the experimental material is 


classified in two directions. For this we list the different possibilities: 


1) Observations occur in each level of factor A combined with each level of fac- 


tors B. There are a-b combinations (classes) of factor levels. We say factor 

A is completely crossed with factor B or we have a complete cross- 

classification. 

1.1) For each combination (class) of factor levels, there exists one observa- 
tion (1 = 1 with nj defined in 1.2). 

1.2) For each combination (class) (i, /) of the level i of factor A with the level j 
of factor B, we have nj = 1 observations, at least one nj > 1. If all nj =n, 
we have a cross-classification with equal class numbers also called a bal- 
anced experimental design. 

At least one level of factor A occurs together with at least two levels of factors 

B, and at least one level of factor B occurs together with at least two levels of 

factors A, but we have no complete cross-classification. Then we say factor 

A is partially crossed with factor B, or we have an incomplete cross- 

classification. 
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3) Each level of factor B occurs together with exactly one level of factor A. This 
is called a nested classification of factor B within factor A. We also say that 
factor B is nested within factor A and write B<A. 


The kinds of the two-way classification are as follows: 


nj=1 for all (4 7) complete cross-classification with one observation 
per class. 

ny = 1 for all (i, j) - complete cross-classification. 

ny =n=1 for all (i j) — complete cross-classification with equal class numbers. 

ny, £0; ny, #O for at least one i and at least one nj =0 — incomplete cross- 
classification. 

Ni,j #0; ni, AO for at least one j and at least one nj =0 — incomplete cross- 
classification. 

If nj 40, then ny =0 for i#k (at least one nj >1 and at least two ny 40) — 
nested classification. 


5.3.1. Cross-Classification (A x B) 


The observations y, of a complete cross-classification for the ith level A; of fac- 
tor A (i=1,...,a) and the jth level B; of factor B (j= 1,...,b) in the case nj = 1 can 
be written in form of Table 5.7 and in the case of equal class numbers in form of 
Table 5.8. W.Lo.g. the levels of factor A are the rows and the levels of factor B are 
the columns of the tables. The special cases of Tables 5.7 and 5.8 are considered 
at the end of this section. Initially we consider a universal cross-classification 
where empty classes may occur. Let the random variables y;,, with class (i, /) 


Table 5.7 Observations (realisations) y; of a complete 
two-way cross-classification with class numbers nj = 1. 


Levels of factor B 


B, B ake B see By 
Levels of Ay Yn iw Ny Vib 
factor A Ag. Hse Says Poy Yop 
Ai Ya Ja Vij Vib 


Ag Var Ya2 Vaj Yab 
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Table 5.8 Observations (realisations) y; of a complete two-way cross-classification with class 
numbers nj =n. 


Levels of factor B 


B, Ba elt B ae By 
Levels of factor A A, Vu Vi21 Vy Vibi 
Vi12 Y122 Vij2 JYi1b2 
Vin Vi2n Vijn Vibn 
A Joi 3221 21 J2b1 
J212 222 Y2j2 J2b2 
J21n J/22n J2jn J2bn 
Aj Jil Ji21 Jijl Jib 
Yir2 Ji22 Jij2 Vib2 
Viin Jin Dijin Yibn 
Ag Yair Ya21 Yajl Yab1 
Yai2 Ya22 Yaj2 Yab2 
Yain Ya2n Yajn Yabn 


be a random sample of a population associated with this class. Mean and var- 

iance of the population of such a class are called true mean and variance, respec- 

tively. The true mean of the class (i, j) is denoted by ny. Again we consider the 

case that the levels of factors A and B are chosen systematically (model I). 
We call 


b 
ae oi Dat 
oa ab 


H=] 


the total mean of the experiment. 
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Definition 5.2 The difference a;=7; —y is called the main effect of the ith 
level of factor A, and the difference bj =7;—y is called the main effect of the 
jth level of factor B. The difference aj) = 1, —7,; is called the effect of the ith level 
of factor A under the condition that factor B occurs in the jth level. Analogously 
bi = Ni —7;, is called the effect of the jth level of factor B under the condition that 
factor A occurs in the ith level. 

The distinction between main effect and ‘conditional effect’ is important, if 
the effects of the levels of one factor depend on the number of the level of 
the other factor. In ANOVA, we then say that an interaction between the 
two factors exists. We define the effects of these interactions (and use them 
in place of the conditional results). 


Definition 5.3 The interaction (a, b), between the ith level of factor A and the 
jth level of factor B in a two-way cross-classification is the difference between 
the conditional effect of the level A; of factor A for a given level B; of factors B 
and the main effect of the level A; of A or, which means the same, the difference 
between the conditional effect of the level B; of B for a given level A; of A and the 
main effect of the level B; of B or as formula 


(a,b), = aij Gj = dy); bj =ny 1;,—N jt HM. (5.14) 


Under the assumption above the random variable y, of the cross-classification 
varies randomly around the class mean in the form 


Vie = Nj + Ck 


We assume that the so-called error variables e;, are independent of each 
other N(0, o”)-distributed and write 


Vijk =H + aj +b: + (ab); + eins GiGi] = ld a1, sig) 
(5.15) 
with (a,b); =0 if nj =0. If in (5.14) all (a,b); =0, we call 
Yi = Mtr bre, (i=1,...q7=1,....b; k=1,...,1y) (5.16) 


a model without interactions. 
The models (5.15) and (5.16) are special cases of (5.1). To show this we write 


T 
Y= VT ie, MaMa aN aons) , 
T 
B= uitisnds, Divide, GPs ODay CD GOs @Das) 
for (5.15) and 


p= (M, Q)5++»Aay Dy, by)” 
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for (5.16). In (5.15) let r of the nj be equal to 0 and ab—r =t of the nj be larger 
than 0. 

If (5.15) is written in matrix notation, then f is a [({+a+b+1) x1] vector 
[((a+1)(b+1)-r=t+a+b+1] and X a {Nx |t+a+b+1]} matrix of zeros 
and ones while e is a (N x 1) vector of random errors and N(0, o’ Iy)-distributed. 
Then Y is N(Xf, o7Iy)-distributed. 


5.3.1.1 Parameter Estimation 
Before we generally discuss the estimation of the model parameters, we consider 
an example. 

We demonstrate the choice of the matrix X in (5.1) by 


Example 5.5 Let a=b=n=2, so that r=0, t=ab=4 and 


T 
Y= (Vp Mv avI129/ 210 V212°V2219 202)» 


e= (€111, €112,€121,€122) €2115 €212s 201, €22) 5 
T 
p= (H,41,42,b1,b, (a,b) (44)19) (4b), (a,b) 99) . 
Then 


ee 
oR Se a el 
PrP FPF COO oO oO 
Coc OoOrFR FPF OOF Fe 
ere OC Or rF CO O&O 
OO: (SO OO 
So -O. Or eo ie SO 
Kon A > DE <> > > 
Boe 6. oO. -Oo Co oS 


10 0 0 


is a matrix of rank 4. Further N = abn and 


N bn bn an annunnn 844442222 
bn bn 0 n nunno0dod 440222200 
bn 0 bn n nO0OOuN 404220022 
ann n an O0OnOUn 0 422402020 
X’X=|an n n 0 anOnOn|=|422040202 
nnonoundd0dod 220202000 
nnodoxnzodn0 0 220020200 
noOxnenoddund 202200020 
nOnO12nOd0028n 202020002 
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The matrix B in (5.4) following Definitions 5.2 and 5.3 has the form 


NN OO O00 0 0 
NN 00 0 0 
00NWN O O 
0000NN 
00N OWN O 


w 
ll 
ooool!t 


0 
0 
0 
0 


ooo oO 


under the side conditions 

a b a b 
So ai=S—b)=0, S > (ab); =0 forallj, S > (a,b); =0 foralli. (5.17) 
i=l j=l 


i=l j=l 


This leads to 


00 000 0 0 0 0 
0N? N27 0 0 0 0 0 0 
0N? N? 00 0 0 0 0 
0.50: ONAN (O° 0) “08 0 

BTB=|0 0 0 N?N? 0 0 0 O 
00 0 0 O 2N? N? N? O 
0:0: 20-10) 10s INe Ne Or 
00 0 0 0 N? O 2N? N? 
00 00 0 0 0 N? N 


with rk(B’B)...(B7B) = 5 and further 


N bn bn an an n n n n 
bn N?+bn N? n n n n 0 0 
bn N? N?+bn n n 0 0 n n 
an n n N?+an N? n 0 n 0 
X?X+B™B= an n n N? N?+an 0 n 0 n 
non 0 n 0 2N? +n N? N? 0 
non 0 0 n N? N?+n 0 0 
n 0 n n 0 N? ) 2N? +n N? 
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8 4444 2 2 2 2 
468 644 2 2 2 2 0 0 
46468 2 2 0 0 2 2 
4 2 268 644 2 0 2 0 
=|4 2 26468 0 2 0 2 
2 2 0 2 O 130 64 64 O 
22 0 0 2 6466 O O 
2 02 2 0 64 O 130 64 
202 02 0 0 64 66 


Estimators B for # we obtain under these side conditions as in Section 5.2.1 by 
calculating (X7X + B™B)' and B =(X™X+B™B)' XTY. 

The following statements are independent of the special choice under the side 
conditions. 


Theorem 5.5 The matrix XX of the model equation (5.15) written in the form 
Y=XB +e 
with the [N x (¢ + a+b + 1)] matrix X has rank ¢ > 0, anda solution of the normal 
equations X7X ~ = XTY is given by 
(ab;;) = 9, foralli, j with nj >0 
a;=0 foralli, b; =0 forallj, #=0. (5.18) 
Proof: We write X = (x1,%9,...,%4a+b+1) With the column vectors x; of X. We 
il that S“* ox) = 
easily see tha ye ae 
a+b+t+1) corresponding to (a,b); those corresponding to all (a,b); for a 
given i, then we obtain x;,1. Adding to ~; all those corresponding to (a,b); 
for a given j, we obtain x,414;. That means that from the ¢+a4+b+1 rows of 


X'X, at least t are linearly independent; because the last t rows and columns 
of X'X are a diagonal matrix with ¢ from 0 different elements, we have 


a+b+1 


MEM: Adding to «;(l=a+b+2,..., 


rk(X7X) =t. We put a+b+1 values of # equal to 0, namely, pi, a1, ..., da, Di, 
..., bp. The last t equations of the system of normal equations are then the solu- 
tions (5.18). When all (a,b),, = 0, that is, when model equation (5.16) has to be 
used, we obtain 


Theorem 5.6 When all (a,b); = 0, then the matrix X'X of the model equation 
(5.16) has, written in the form Y = Xf + e, with the [N x (a + b + 1)| matrix X the 
rank rk(X7X) =rk(X) <a+b-1. 
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Proof: X‘X is a symmetrical matrix of order a + b + 1. The sum of the second up 
to the (a + 1)th row equals the first row; the (a + 2)th up to the last row also add 
up to the first one, so that the rank is at most a+ b+1. 


Before solutions of the normal equations for model equation (5.16) are given, 
we list some estimable functions and their BLUEs for model (5.15). 


5.3.1.1.1 Models with Interactions 
We consider the model equation (5.15). Because E(Y) is estimable, then 


Ny=Ht+ai+bj+ (a,b), forallij withny>0 
is estimable. The BLUE of 7; is 


fy = B+ @; + bj + (ab) : (5.19) 


j 
because fi + a; + bj =0 and (ab) _=Jjj- From (5.6) it follows 
ij 
aa Ve 
COV (‘tyr = koi: (5.20) 
ij 
It is now easy to show that differences between a; and b; are not estimable. All 
estimable functions of the components of (5.15) without further side conditions 


contain interaction effects (a, b),. It follows the theorem below. 


Theorem 5.7 (Searle, 1971) The function 
Lesap= ace Yea(y+( + (a,b),) - Son(h +(a.b)y) for i#k — (5.21) 
or analogously 
ee mbt Soda +( (a,b);,) - Yak (as (a,b),,) for i#k 


is estimable if c,;=0 for n,;=0 and d,,;=0 for n,;=0, respectively, as well as 


b b a a 
yoy = ee =1 (sn So di = Sodi = respective : 
j=l j=l 


jel gel 


The BLUE of an estimable function of the form (5.21) is given by 


b b 
La = s CHV i. i >: CHV K, (5.22) 
j=l j=l 
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and it is 
b 2 2 
Co Ce 
? k 
var(L 4) =0° s ++]. (5.23) 
j=l Nig NK 


Proof: An estimable function is always a linear combination of 7. Therefore 
Crs =O, if ns =0. Now 


b b 
S Ci. — oo CKiNK. 
j=l j=l 


as a linear function of the 7; is estimable. Because of 


b b b 
So cin = Say (u +aj+bj+ (a,b);) =Ht+aj+ a (0 + (a,b),) 
j=l j=l j= 


and the analogous relation for the corresponding term in cy, the estimability of 
L, and the validity of (5.22) and (5.23) follow. 


If we use model equation (5.16) without interactions and side conditions, then 


ny =E (ve) =+a,+b; is an estimable function; the differences a;-a; and 


b;— b; are estimable. 
We consider the following example. 


Example 5.6 From three test periods of testing pig fattening for male and 


female offspring of boars, the number of fattening days an animal needed to 
grow from 40 to 110 kg has been recorded. The values are given in Table 5.9. 


Table 5.9 Results of testing pig fattening: fattening days (from 40 to 110 
kg for three test periods and two sexes) for the offspring of several boars. 


Sex 


Male Female 


Test periods 1 91 
84. 99 
86 


92 89 
90 


96 


86 - 
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We choose model equation (5.15) as a basis and write it in the form 


91 1100701700 6.0 ens 
84 11001010000 - en 
86 11001010000 si en 
99 11000101000 ie ébe 
94 10: D010 00100 o en 
92 10101000100 en 
9} [10101000100 an a ee 
96 10101000100 ))u exis 
97 POL 010.00 10 a e091 
89 10100100010 aa ba 
82 10011000001 ae e311 
86 10011000001 es19 


We have r=1, f=3-2—-1=5 and N =12; X is a (12 x 11) matrix of rank 5. 
We obtain 


NO 


»< 
4 

Se 

Ul 
oN BP RF WwWwon WD & 
SC OOF WF WO Oo UBC 
CoN BP OO ON KH ODN CO A 
obo CoCCoCoCoO ONC NT OUCOCOULUCOLUWN 
No OCF OW OoO ONY FW OO 
oN CO FPF OW OD ON FF WwW 
Co oOcmUmcrOWUUCUOUCNPCUOULCUNNCUCOUCOOWDE 
Srp OS eS Sr St Sn OS Or eS 
co oF FO CO oO Fk CO KB CO Fs 
onu,M CO CO ON CU OlUCUOUWNMCUCOCOLUWNM 
no OC Oo CO CO ONO ONS OO CUOlLUWN 


and from (5.18) 
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The function L; = b,-b) + (a,b),,-(a,b),, is estimable, because the condi- 
tion of Theorem 5.7 is fulfilled. The function Ly = b,-b2 + (a,b), —-(4,b)o5 
is also estimable. We get 

L1=91-Iy,=-6 L=¥-Fn =6. 


Further var (L1) = ‘o and var (L2) = so. 


5.3.1.1.2 Models Without Interactions 
Model equation (5.16) is simpler than (5.15), but there exists nevertheless no 
simple solution of the normal equations as for (5.15). The matrix X7X is 


N ny. Ng Ny we Ney 
nN, Ny. Ni +. Nib 
: 0 . 0 
T 

X° X= Ng, Ng, Nai Nab 

Ny M1 Ng N.1 
0 0 

N.b Nip ... Nab Nb 


To obtain a simpler solution, we must rename for a < b the factors w.l.o.g. so 
that a>b. Because X’X following Theorem 5.6 has a rank of at most 
a+b-1, we can choose two values of /* arbitrarily. We put w* =f; =0 and 
obtain the reduced system of normal equations 


Ny, O muy +++ M1b-1 a; Y;. 
0 Ng Ngq1 Ng,b-1 a, ~ Y, 
My oc Ma Ny 0 b; | Y, 
0 
Ny,b-1 *** Wab-1 Nb-1 bi Y p-1 
We put 
Ny, 0 M1 +++ M1,b-1 ny 0 
D,= , V= , D= 


0 Na. Nai ++: Nab-1 0 Nb-1 
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Now the matrix of coefficients of the reduced system of normal equations can be 
written as 


DW ; 
Vo Dp ~ 


We put 
W=-V'D)' V+Dy (5.24) 
and assume that R has rank a+ b-1. Then W~! exists. Further, we obtain 
_,. [Dy +Dz' VWWlV' Di! -D;'vwt 
R= ( _w-1yt DS w- ) 
so that with 


v=9,-V"7D;} Y,, v= (seasVp=a) 5 


y=). - Sony,» y. =(N1 Va ee 


i=1 


Va esi (V1 wala se Y, = (6 ee Vyas 


the vector 
0 
: y, -D;} vw-ly 
1b* = WG (5.25) 
0 


is the solution of the system of normal equations and 


0 oa 015 
(xTx)" = 0, D71+Dz!1VW-1V'D>! -Di'vw-l 0, 
Op-1 -W-1v7D,"1 wl Op-1 
0 of OF 0 


(5.26) 


is the corresponding generalised inverse. 


Definition 5.4 A (incomplete) cross-classification is called connected if 
W = ((a,b),) (ij=1,...,b-1) in (5.24) is non-singular. If |W|=0, then the 


cross-classification is disconnected (see also a corresponding definition in 
Chapter 12). 
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Example 5.7 We consider a two-way cross-classification with a = 5, b = 4 and 
the subclass numbers: 


Levels of B 
B, Bo Bz By 


LevelsofA A3in n 0 O 
Ag|O O mm 


As|O O mm 


Here is 7.1 =... =3n, 1.3 =N.4 =2M, Ny. =No. =n3.=2Nn, Ng. =N5. = 2m, and the 
matrix W is given by 


ie aoe 0 
We=| 3 3 

5 x 0 

0 Om 


The first row is (-1) times the second row so that W is singular. The term ‘dis- 
connected cross-classification’ can be illustrated by this example as follows. From 
the scheme of the subclass numbers, we see that the levels A;, Az, A3, B,, By and 
Ag, As, B3, By form two separate cross-classifications. If we add 1 further observa- 
tions in (Ay Bs), we obtain my = 3n,n 3 =2m+n, and W becomes 


5 ae nN 

3 3 

4 5 n 

Wee te ae es 
ar eu piece 

3 3 3 


with |W| #0; now the cross-classification is connected. 


In SPSS we easily see in a cross-classification of A with a levels and B with b 
levels directly that there is a disconnected scheme if df(A)<a-1 and/or 
df (B) < b-1 in the ANOVA table. 

For special cases the two-way cross-classification as complete block designs 
or balanced and partially balanced incomplete block designs is discussed in 
Chapter 12 where only connected designs are used. 


5.3.1.2 Testing Hypotheses 
In this section testable hypotheses and tests of such hypotheses are considered. 
The models (5.15) and (5.16) are handled separately. 
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5.3.1.2.1_ Models without Interactions 
We start with model (5.16) and assume a connected cross-classification (W in 
Definition 5.4 non-singular), that is, rk(X7X) =a+b-1. For a testable hypoth- 
esis K7b=0, we can use the test statistic F in (4.34), that is, the test statis- 
tic reads 

_ BT KIKT (XTX) K]'K7B n-p 


ee Y"[Iy-X(X7X) XT]Y 7) 


and is F(n—p,q,A)-distributed with non-centrality parameter 
1 Aare 
A= gb K [K"(X7X) K] -K'b, p=rk(X7X), q=rk(K). 


K™b=0 leads to 1 = 0. Because K?b =0 is assumed to be testable, all rows of 
K*’b must be estimable functions. To show how (5.27) is used, we consider 
an example. 


Example 5.8 The hypothesis Ho : b, = --- = by is to be tested. Initially we inves- 
tigate whether Hy is testable. We write Hy in the form 
Ho: b;-by = O(j= 1,...,b-1) with 


peak 
— FF 
00... 01 -l 
fs 0-1] : 
K=|._, ; ‘ = (0p-1,a+1-Lp-1, -lp-1); 
ae : 0°: : 
00... 0 1 -1 


so that K7(X7X) with(X7X)~ from (5.26) becomes 
KT (XTX) = (05-1,-W71V7D7", W*,0p-1) 

and K™(X"X) K =W~'. Further with f from (5.25), we have 
K" Baw, 

and the numerator of F becomes 
vw t(w?) twolysyT wey, 


To test the hypothesis Hp : a; =---=d,g, we have to use another generalised 
inverse as in (5.26). We choose ji = 0 and ,4,; = 0 and obtain a reduced system 
of normal equations; in its matrix the first two rows and columns contain 
zeros. Let 
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Np, 0 ny 0 N21 ++ Arp 
Da= — é Dp= Pian , V= 


0 Na, 0 Nb Nai *** Nab 


and W =D,-VD;! V'. The matrix of coefficients 


‘ Dy 
V' D, 


must have (full) rank a+ b-1 so that W ’ exist. Then 
x. wv -W VD; 
-D,)V WD, $D, VW VD, 
b 

follows. Putting v= (iowave) with V; = Y;- Soni jo we get 

j=l 
0 
sp 0 (5.25a) 
7,-DpW ws 
analogously to (5.25) with ¥, = (¥,...j,). In this case 
O22 O2a O2p 


Oexi-=| Os wid; ; (5.26a) 
On -DVW Dp+DV WD 
If there is a generalised inverse, then 
BXTY = (¥-D;'vw-y) "Yat (Wo)? y, = 3 th vwoly 
i=1 Mi 
and 


b Y* 


re d wwe. 


From the special solution b of the system of normal equations independent of 
Bp =XTY and from 


a 2 b 2 


Les Tyy-1 Yj. Tyy-1 
Siw W v= —t+v Wy, 
3 n; » nj 


i=l 4 j=l “J 
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it follows that 


Therefore it is sufficient to calculate any generalised inverse and the corre- 
sponding solution fp". 

From the numerator of the F-statistic for the test of Hp: b, =---=by, the 
numerator of the F-statistic for testing of Ho : a, = --- = a, can easily be derived. 
Because of # = (X'X) X°Y, it follows that 


¥? (1,-X(X7X) X")Y=Y7Y-f' XY, 


and the test statistic for Hp : a, =--- =dy is 
Be NY a ath Hs 
in SW W v 
fet ja 9 N-a-b+1 
~ 4.y? Soe a-l1 
one 
ipk i=. “i 
and for Hp: b, =--- = by correspondingly 
Be vWD N-a-b+1 
a4. y2 PORE; b-1 
2 eeeege ri = 
ae 
ijk i=l Mi. 


If, as in (5.16), mj = (equal subclass numbers), simplifications for the tests of 
hypotheses about a and b result. We have the possibility further to construct an 
ANOVA table, in which SS4, SSg, SSyes = SSp add to SStotai = SSr. 


Theorem 5.8  Ifin model equation (5.16) nj = 21 for all i and j, then the sum 
of squared deviations of y,, from the total mean y__ of the experiment 


a b n 2 
SSr=YTY-Ny=S-S~)> (Win-7---) 
i=1 j=1 k=1 
can be written as 


SS7 = SS 4 + SSz + SSyes 
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with 


SS4,SSz and SS,.; are independently distributed, and for normally dis- 
1 1 1 
tributed yj, it is SS, as CS(a -1, d,), —SSp as CS(b - 1, Ay) and 5 SSres as 
(oy oO Oo 
CS(N - a—b +1) distributed with 


en = 12 2 
dam ad (a-a), y= >~ (b-8.)’. 


j=l 


These formulae are summarised in Table 5.10. 


Example 5.9 Two forage crops (green rye and lucerne) have been investigated 
concerning their loss of carotene during their storage. For this four storage pos- 
sibilities (glass jar in a refrigerator, glass jar in a barn, sack in a refrigerator and 
sack in a barn) are chosen. The loss during storage was defined by the difference 
between the content of carotene at start and the content of carotene after storing 
300 days (in percent of dry mass). The question is whether the kind of storage 
and/or of forage crop influences the loss during storage. We denote the kind 
of storage as factor A and the forage crop as factor B and the observations (differ- 
ences y,) can be arranged in the form of Table 5.7. Table 5.11 shows these values. 


Table 5.10 Analysis of variance table of a two-way cross-classification with single subclass 
numbers (nj =n). 


Source of 
variation SS df Ms F 
1 a 1 SS. MS, 
=— y2 ay? = eee =M. a =F, 
Between the SSA i we iis poe 2 1 a Sa Ms.. 4 
levels of A 
1 b eee ee SSB MSpB 
=— oo b-1 paced = 
Between the SSp Tr ees rae iat MSz MS.x Fp 
levels of B 
Residual SSres = SS - SS4 - SSgB N-a-b+1 SSres 
N-a-b+1 
= MS res 
1 
Total S82 =D iS t N-1 
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Table 5.11 Observations (loss during storage in percent of dry mass during storage of 
300 days) of the experiment of Example 5.9 and results of first calculations. 


Forage crop 


Green rye _—_ Lucerne Y;. y? ee Fi 

Sp Glass in 8.39 9.44 17.83 317.9089 = 159.5057 
= refrigerator 
ss Glass in barn 11.58 12.21 23.79 565.9641 283.1805 
2 Sack in 5.42 5.56 10.98 120.5604. 60.2900 
ww, refrigerator 

Sack in barn 9.53 10.39 19.92 396.8064 198.7730 
Yj 34.92 37.60 72.52 1401.2398 
¥} 1219.4064 1413.7600 2633.1664 
Jj 324.6858 377.0634 701.7402 


Because forage crops and kinds of storage have been selected consciously, we use 
for the y; model I and (5.16) as the model equation. 

The ANOVA assumes that the observations are realisations of random vari- 
ables that are independent of each other with equal variances and normally dis- 
tributed. Table 5.11 shows further results of the calculation. Table 5.12 is the 
ANOVA table following Table 5.10. As the F-tests show, only factor storage 
has a significant influence on the loss during storage; significant differences 
could be found only between the kinds of storage, but not between the forage 
crops (a =0.05). 

How many observations per factor level combination are needed to test the 
effects of the factors ‘kind of storage’ with the following precision requirements: 
a=4, b=2, a=0.05, B=0.1 and d/o =2? 


Table 5.12 Analysis of variance table of Example 5.9. 


Source of variation SS df MS F 
Between the storages 43.2261 3 14.4087 186.7 
Between the forage crops 0.8978 1 0.8978 11.63 
Residual 0.2315 3 0.0772 

Total 44.3554 7 


249 


250 


Mathematical Statistics 


Hints for Programs 
With OPDOE of R we put in 


size.anova (model="axb", hypothesis="a", a=4, b=2, 
talpha=0.05, beta=0.1, delta=2, cases="maximin") 


and 


size.anova(model="axb", hypothesis="a", a=4, b=2, 
talpha=0.05, beta=0.1, delta=2, Cases="minimin") 


We obtain the output 


We plan therefore experiments with 3 up to 5 subclass numbers. 


5.3.1.2.2 Models with Interactions 
We consider now model (5.15) and assume a connected cross-classification. 
Also in this case a testable hypothesis K’b=0 can be tested by the statistic 


(5.27) if the yi are N (x +a, +bj+(a, b),,07) -distributed. Now f has the form 


Pa eainsdabianbElasd) gic (ab)q5) - 


Each estimable function is a linear function of 
E( ye.) = Ny =H + ai + bj + (a,b) ;. 


A testable hypothesis K7/ = 0 has also the form K"£ = Tn =0 with the vector 
n= Giiseatiay 3 with the ¢ components 7, for which nj >0. By this we obtain 
(5.27) due to (5.18) and 


extay = (Mahe “a 


On1+a+b D 


1 

with a (¢ x t)-diagonal matrix D with elements — (1; >0), that is, because of 
Ni 

K™(X?X) K=T'DT, we get 
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YOU T DE) TY? Noe 
F-=— i (5.28) 
Ye XOX) XY 


with Y = (9; 5-.-J gp.) » In (5.28) q is the number of (linear independent) rows of 
K™ or T’. 
Before considering special cases (nj =1, ny= n), we look at 


Example 5.10 For the values of Table 5.9 of Example 5.5, the hypothesis Hp : 
b, -b, + (4,b),, - (4,b),, = 0 for a = 0.05 is to be tested. Now Hp is equivalent to 
1-2 =9, so that T7=(1,-1,0,0,0). In Example 5.5 we have 
(XTX) a Os 5@D with 


1 
~0000 
3 


01000 


lr © 


00-00 


Nile 


1 
2 


Further it is 7 = (87,99,93,93,84), q= 1, t=5 and N = 12. Moreover we find that 


ll 
Me 
Me 
Sk 
= 
3 
>< 
aj 
s 


SSres = Y" [Iv-X(X7™X) XT|Y 


i=1 j=1k=1 
and with b=p from (5.5), this is 
a b Ny a b y2 
T T ee i ~ ~ ~ 9 ~ pa eae 
SSres =¥" [In -X(X°X) X*]¥ = Z Yin > Ty’ (5.29) 
i=l j=l k=1 i=l j=l ° 4 


writing down only summands for > 0. In the example it is 


ni 


EE Jak = 98 600, yy =98 514 


to1 j= i=l j= 
and SS,e; = 86. From 


= 4 z 
ers Ju. Fag, = — 12, T'DT=;, (T"DT) = 


| 0 
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we get for F in (5.28) 


1 
F= Bey =8.791. 
86 


By this the null hypothesis is rejected, because F(1,7|0.95) = 5.59. 


We consider now some special cases. Initially let nj = 1, so that t = ab,N = abn 
and N-t=ab(n-1). The observations can be written in form of Table 5.8. 
Because all classes are occupied, we have 


b b 
Aik = 4j- ax + ; bs (t)y-Yla (ik =1,...,.a;i4k) 


j=l j=l 
and 


a 


Bi = bj— bi + - ys (a,b) ,- on Gil= Ln dij Al) 


i=1 i=l 


a 


as estimable functions. This can easily be shown, because 


b b 
aj-ag+ ay ((a,b),,- (a,b)y) = 3 (nj-1)). 


The BLUEs of Aj, are 


ee eee 
Aix = bos (v -Iy,)» 
J= 
and BLUEs of Bj are analogously 


ss 14 7 m 
Bi= -S> (54. -5u.)- 
aay 


By this the null hypotheses 
12 1 
Ha : dj + poe (Py =dgt pm (a,b) (i= 1,...4-1), 
je ie 


1< iJ ? 
Hog % b; + a (a,b); = by + re (a,b) (7 = 1,...,b- 1) 
are testable. W.l.o.g. we consider only Ho,. We write Ho, in the form 


re 1 
Apa: ivy , ij 4a 7 , o> P= 13355 -l 
0A + Gi + pd b); 4, be (a b)aj 0 (i a-1) 
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or 
K'Bp=0 


with 


KR" 2 (0p 30 “@ Ayr} 
= a-1ls4a-1) a-1»~a-1, bly Spier plea 2 


We consider the next example. 


Example 5.11 Assume that for the classification in Example 5.5 four observa- 
tions per class are given. Then a4 =3,b=2 and 


T 
p= (u,41,42,43,b1,b2, (a,b) 115(4,) 19s (GB) a1 (4B) 995 (4s) 315 (a,b) 39) : 


We test the hypothesis 


Hy say + 5 ((asb)yy + (asb):2) = 42+ 5 ((asb)a1 + (4sB)za) = 45+ 5 (tsb) + (b)a). 


K? in K™B=0 has the form 


010-100-200 Pe 
rs 2. 2, 2 2 
OOa A bro ee oe 

QoQe ds» “2, 


If in the general case K” is given as above, F in (5.27) can be simplified. With p 
from (5.18), we have 


p'K= (i3: ly, Ales [usr] 


j=l 


Further 


T T a] 1 &T [x 1 1 
KT (XTX) K =—K*TK* = 7M = 
(and also the multiple M) is a [(a—1) x (a-1)] matrix of rank a-1. K*? is the 
matrix generated from the ab last columns of K. Subtracting in M the (i + 1)th 
row from the ith row (i= 1,...,4-2) and adding then the first column to the sec- 
ond one, the so generated new second column to the third and so forth, we find 
that |M| =a. By this it is 


(la-1 + lg-1,a-1) 


1 a 
ey es a 
F b4-lya-l 
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The minors of order a—2 belonging to the main diagonal elements of M are 
a-1, and the others are -1, so that 


a-1 1 
a a 
1 a-l 
[KT(XTX) K] ‘=bn| @ @ 7 


1 
=bn (10-1 > fleet] : 
a 


baa N 
Ce (5.30) 
For Hog we correspondingly receive 
1 ; 2 ly 
(29° Le "| ab(n-1) 
Fo (5.31) 


(b-1)SSyes 
Under the side conditions 


a 


b 
S > (a,b), =0 foralli, S (a,b), =0 forall 
1 


j= i=l 


a,;—a, and b;—b; are BLUE with the estimable functions 


Gj- An =¥;,,-Jx., (iF) 

and 
b-b=y¥; -¥, GF I). 

Then the test statistics (5.30) and (5.31) can be used to test the hypotheses 
Hq 141 =+++ =dq and Hop : by = ++ = bp. 


In the case of equal subclass numbers, we use the side conditions (5.17) and test 
the hypotheses 


Hoap 7 (a,b),, = (a,b) ,4( =0) 
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with the F-statistics (5.30), (5.31) and 


Le ee ee 1 
(25-95 aro Uae Loa 1 Jason) 


i=1 j=l 


Fype= ; 
on (a—1)(b—1)SSyes 


(5.32) 


respectively. 
The ANOVA table for this case is Table 5.13. Because of 


SSr = SS, + SSz + SS 4B + SS yes 


the F-statistics (5.30), (5.31) and (5.32) are under the hypotheses Ho, ; Hog and 
Hoag central as F[(a-1), ab(n-1)], F[(b-1), ab(n-1)] and F[(a-1)(b-1), 
ab(n-1)], respectively, distributed. Otherwise they are non-central F- 
distributed. 


Example 5.12 We consider again Example 5.9 and the storages in glass and 
sack with four observations per subclass as shown in Table 5.14. Table 5.15 
shows the calculation and Table 5.16 is the ANOVA table following 
Table 5.13. Due to the F-test, Ho, has to be rejected but not Hog and Ao,p. 
How many replications in the four subclasses are needed to test the hypothesis 


Aap : (4;b)11 =+++ = (a,b) 99( = 0) 
with the precision requirements in the following R commands? 
The input is 


size.anova (model="axb", hypothesis="axb", a=2, b=2, 
alpha=0.05, beta=0.1, delta=2, cases="minimin") 


The result is 


n 
4 


The maximin size is 6. 
A further special case is nj = = 1. We also consider this case under the side 
conditions (5.17). Then the following theorem can be stated. 


Theorem 5.9 (Tukey, 1949). 

The random variables y,(i=1,....4 ;j=1,...,.b) may be represented in the 
form of Equation (5.15) with mj=1 for all i, 7 and (5.17) as well as 
(a,b), = a:b; may be fulfilled. The e;; in (5.15) are independent from each other 
N(0, 0”)-distributed for all i, j. Then with the symbolism of Table 5.13, 
(for n=1) and 
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Table 5.13 Analysis of variance table of a two-way cross-classification with equal subclass numbers for model | with interactions under the 


condition (5.17). 


Source of variation 


Between rows (A) 


Between columns (B) 


Interactions 


Within classes (residual) 


Total 


SS 


1 2 1 2 
SSa= paper aro 


oF 
2 2 
yea es Ve 


1 
SSz = — 
5 an N 


1 2 1 2 
SSap= 2 Vi Gy Daiki 


Sa= Day Ie gd! 


1 


2 
ij th 


df 


Ms 


SS4 
a-1 
SSp 
b-1 
SSB 


(a-1)(b=1) 


E(MS) 
ae a 
n (a,b); 


F 


ab(n-1)SS4 
(a-1)SSres 
ab(n-1)SSg 
(b-1)SStes 
ab(n-1)SSaB 
(a-1)(b-1)SS yes 


Table 5.14 Observations of the carotene storage experiment of 


Example 5.12. 


Analysis of Variance (ANOVA) — Fixed Effects Models 


Forage crop 


Green rye 


Kind of storage Glass 8.39 


7.68 
9.46 
8.12 


Sack 5.42 


6.21 
4.98 
6.04 


Lucerne 


9.44, 
10.12 
8.79 
8.89 
5.56 
4.78 
6.18 
5.91 


Table 5.15 Class sums Y;, and other results for the observations of Table 5.14. 


Kind of Glass 
storage Sack 
Yj. 
y2 


Table 5.16 Analysis of variance table for the carotene storage experiment of Example 5.12. 


Forage crop 


Green rye Lucerne 


33.65 37.24 
22.65 22.43 
56.30 59.67 


3169.6900  3560.5089 
1645.3450  1889.9225 


Source of variation SS 

Between the kind of storage 41.6347 
Between the forage crops 0.7098 
Interactions 0.9073 
Within classes (residual) 4.9128 
Total 48.1646 


Yi. 


70.89 
45.08 
115.97 


6730.1989 


J 


ies 


5025.3921 2519.1401 
2032.2064 10161274 
7057.5985 


3535.2675 


MSs F 

41.6347 101.70 
0.7098 1.73 
0.9073 2.22 
0.4094 
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a b 4 
Pe eee EF Says ab 

SSy = Ed o.-73609,) (4-9.-91+9,)] gs.ss, >) 

v=1J= 
the statistic 

SSn 

F= 1)(b-1)-1 5.34 

SSap-SSy |“ )(b-1)-]] (5.34) 


is as F[1,(a—1)(b-1) -1]-distributed if the null hypothesis Hoag: (a,b); =0 for 
all i, j is valid. 
Before showing this theorem we prove two lemmas. 


Lemma 5.4 Under the assumptions of Theorem 5.9, we have the following: 


a) f'=¥ , is independent of 4 =y; -y __, bi =yj-Jy,, and (ab) ViVi I jt 
ij 
y.. for all i, j. 
b) 4; 


c) @; and (ab) a are independent for all i, k, 1. 


d) bj and (ab) are independent for all j, k, J. 


and b, are independent for all i, 7. 


1 -1 
e) fis N (u a”) -distributed, the 4d; are N («. 7) -distributed for all i, 


the b; are N (+, 57? ) distributed for all j, the (ab) _ are 
ab ij 
(a-1)(b-1) 
x ab 
SS are y’-distributed. 


N (a0 a - distributed for all i, 7 and the corresponding 


We further have 


f) cov (a;,4;) = -< 0? for iFj, cov (bi,bj) = -< 0? for iFj, 


cov( (a), (<2),,) = © (aby 1) (b6-1). 


Proof: By the assumptions y; are N (u +aj+bj+ (a,b);,07) -distributed. The 


estimators are as linear combinations of the y, also normally distributed. From 

(5.17) follows E(F_) =p, E(a:) = ai E( bj) = bj, E(ab) _= (a,b), for all i, j 
ij 

Now we get 
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and 
a b 1 a b 
var(@;) = var Paez - 535] 
b a b 
pp vat [i - YS y, - +>, . 


j=l tf jel 


Because the two terms within the square bracket due to the assumption are 
independent, we have 


a-l1 


1 
PP [(a 1)°bo* + (a 1)bo*] = 

Analogously the other relations under (f) follow. By this (e) and (f) are 
proved. To show the independencies, in (a) to (d) due to (e), we have only to 
show that the correlations are zero. 

For (a) cov(y..,a;) = cov (¥..,j;, -7..) = cov(¥..,9; )— var(y;..), and because of 


ov (SE Dir i =z apo (>: Sdn) = 5 


¢=1, j= i=1j 


var (d;) = a 


this covariance is zero. The proof of the other statements in (a) to (d) we leave as 
exercises. 


Lemma 5.5 Under the conditions of Theorem 5.9 


is N(0, o”)-distributed, if (a,b); =0 for all i, j. 


Proof: We consider the (a + b + 1) - dimensional random variable (u, a), ..., da; 
b,, ...,b,) and show that the conditional distribution of u for given realisations 
a; and b,ofa; and b;is independent of a; and b;and by this equal to the mar- 
ginal distribution of u. 

For fixed a,, b; the variable u is a linear combination of the normally distrib- 
uted (a, b);; (from Lemma 5.4), and therefore the conditional distribution of u is 
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a normal distribution. We have E ( (ab) 
ij 


) = (a,b), so that under the assump- 


tion (a,b), =0 


E(ula;,b;)=0 (i=1,....4 j=1,...,b) 
is independent of a; and b;. From (e) and (f) of Lemma 5.4, var(u|q;,b;) = 07 
follows. Because the expectation and the variance of u are independent of the 
conditions and u is normally distributed, the statement follows. 


Proof of Theorem 5.9: 
Under the hypothesis Hog: all (a,b); = Othe sum of squares of interactions 
SS,z is distributed as CS[(a—1)(b-1)]. We assume that (a,b); =0 for all i, j. 


SS 2 
From Lemma 5.5 it follows that a = - is CS(1)-distributed. Because 


2 

SSap SS 

ea -—* is non-negative (Schwarz’s inequality), it follows from Theorem 
oO 


2 
4.6 that this difference is distributed as CS[(a-—1)(b-1)-1]. From Corollary 


SS 
4.1 of Theorem 4.6, SS,; and as 
oO 


- ——* are independent of each other. This 
oO 


completes the proof. 


The results of Theorem 5.9 are often in the applications used as follows. With 
the F-statistic of Theorem 5.9, the hypothesis Hogg : (a,b); = O is tested. If Hog 
is rejected, a new experiment to test Hp, and Hog with 1 > 1 has to be carried 
out. If Ho4g is accepted, Hp, and Hog (often with the same observations) are 
tested with the test statistic in Table 5.10. Concerning the problems of such 
an approach, we refer the reader to special literature. 


5.3.2. Nested Classification (A>B) 


A nested classification is a classification with super- and subordinated factors, 
where the levels of a subordinated or nested factor can be considered as further 
subdivision of the levels of the superordinated factor. Each level of the nested 
factor occurs in just one level of the superordinated factor. An example is 
the subdivision of the United States into states (superordinated factor A) and 
counties (nested factor B). Table 5.17 shows observations of a two-way nested 
classification. 

As for the cross-classification we assume that the random variables yj, in 
Table 5.17 vary randomly from the expectations 7,, that is, 


Vij = Nij + Ciik (215.287) = Tghek Slit); 
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Table 5.17 Observations y;, of a two-way nested classification. 


Levels of the factor A A, Ag ww Ag 

Levels of the nested By ... Bip, By, ... Bip, eae Ba... Bab, 

factor B 

Observations Vi ee Mba Yor. vee Vb ie, Val sax Vabe 
Vi12 ves Vi dy Y212- vee V2doy tee Val2 ++ Vabay 
iim +++ Vibe J21ng, +++ V2bonrw, = Valng +++ Yabanavg 


and that e;, are independent of each other N(0, o”)-distributed. With 


ab; 
>» ony 


i=l j=l 
Ne © 


— 
ll 
3 


the total mean of the experiment is defined. 
In nested classification, interactions cannot be defined. 
Analogously to Definition 5.2 we give 


Definition 5.5 The difference a; =7; —y is called the effect of the ith level of 
factor A, and the difference by =; —n;. is the effect of the jth level of B within the 
ith level of A. 


By this the model equation for y,x is 
Vijk =H + aj + dy + ex (5.35) 
(interactions do not exist). It is easy to see that (5.35) is a special case of (5.1) if 
T 
Y= (Farr D amy Prato D2mar- Daten) ’ 
T 
p= (My 15.--s4asD11).--sDab, ) , 
i 
e= (G11 sl Lignans Cando Cabitigs, ) 


and X is a matrix of zeros and ones so that (5.35) is valid. From assumption it 


follows that e is N(Ov,07Jy)-distributed (w- Ym) Y ande are(N x 1) 
bf 


vectors, is a [(a+1+B.) x 1] vector ( = sn) : 
i=1 


Example 5.13 demonstrates the choice of the matrix X. 
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Example 5.13 In Table 5.18 observations of a two-way nested classification 
are given (artificial data). Now we have 


Y = (14,12,15,18,12,14,6,5,10,7,8,12) 5 


and 
B= (f,1,42,b11,b12,b21,b22,b23)' - 
Then 
11010000 
11001000 
11001000 
11001000 
10100100 
10100100 
a = (1y2,14@1,11013612814612), 
10100010 
10100010 
10100010 
10100010 
10100001 
10100001 


A, Az 
By Biz Bo, Bao Bo3 
ijk 14 12 12 6 8 
15 14 5 12 
18 10 
7 
nj 1 3 2 4 2 
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and X7X becomes 


124813242 
44013000 
8080024 2 
yes 11010000 
33003000 
20200200 
40400040 
20200002 


The matrix X7X is of order 8 and, as it can be easily seen, of rank 5, because the 
second and third rows sum up to the first row, the fourth and fifth rows sum up 
to the second one and the last three rows sum up to the third one. 

This may be generalised. One column of X corresponds to y; 4 columns cor- 
respond to the levels of A (the a;;i=1,...,a); and B. = bi columns corre- 
spond to the levels of B within the levels of A. The order of X'X equals the 
number of the columns of X, and by this it is 1+a + B.. X7X has with 
N= iN; the fe 

i" jN; the form 


N Ni aes Na Nyyo-°° Nib, “) Ag ott Nab, 
Ni Nive) O mi css My, 1+ O + 0 
Na 0 =) Nz O = O +++ Mar +++ Nab, 
N11 My, °°: (0) Nyy 0 sa hie. Anse 0 
XIX= 

Nib, Mb, ats (0) (0) wee A1b, aa @) 

Nai O -:: Nal Qs 0 “) May ooe 
Nab, O +++ Ngb, O +++ O s+ O ++ Nap, 


As we see the first row is the sum of the a following rows, and the ith of these a 
following rows, that is, the (i+ 1)th row is the sum of the b; rows with the row 


<e ; 
numbers a+ 1+ ee 3 uptoa+1+ ae 5; That means there are a + 1 lin- 


ear relations between the rows of X/X. Therefore rk(X7 X) of X‘X is smaller or 
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equal to B.. But rk(X7X) =B., because the B last rows and columns are a non- 
singular submatrix with the inverse 


1 


Ni 


Nab, 


and by this a generalised inverse (X7X) of X7X is given by a matrix of order 
a+1+B.. Their elements apart from the B last ones in the main diagonal are 
equal to zero. In the main diagonal are a+1 zeros and further the B. 


1 
values —(i=1,...,4; 7=1,...,B;). 
ni 
Example 5.14 We consider the matrix X7X of Example 5.13. We derive 
(X7X)~ as shown above and obtain 
00000000 
000000 
000000 
01 00 
0 


o oo Oo 


0 
0 
0 
0 


0 0 


0 
0 
1 
(ay. = 3 
0 


io) 
jo) 


0 0 


oO NIE 
fo) 
fo) 


Oo Blr 


0 


1 
2 


000000 


The reader may show as an exercise that X7X(X7X) X7X =X7X. As matrix 
B in (5.4), we may choose, for instance, a l(a +1) x (a +1+ >) | -matrix 


Analysis of Variance (ANOVA) — Fixed Effects Models 


corresponding to the side conditions 


a b; 
Sv ai= 5° by=0 (for alli). 
i=l j=l 


We see that 
0 O 0 O 0 0 0 
0 N2 ... N? 0 0 0 0 
0 
0 N? ... N? 0 0 0 0 
0 O 0 N? N? 
0 
B'B= 
0 O 0 N? N? 
0 O 0 N? sess ING 
0 
0 O 0 N? shy N? 


= O@N?1yaON715,5,0---ON? 11,0, 


is of rank a+ 1. We choose instead 


ON, .. Na O . O O .. O 


0 0 .. O Ny, «+. Nib, 0... 0 
B= 


00 .. 0 O .. O Ag ... Nab, 
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and this is corresponding to the side conditions 


a b; 
YS Niai=0, So njby=0 (for alli). (5.36) 
i=1 j=l 
Minimising 


ab Ny : 
(vie —H- 41 - by) , 
i=1 j=1k=1 


under these side conditions without the cumbersome calculation of 
(XX +B7B) ‘, we obtain the BLUE (MSE) 


h=I_, 4:=9,,-9, by =, -F;... (5.37) 


Theorem 5.10 In a two-way nested classification, we have 


SS7 = ‘s (y.-3..) 


ijk 
=S00; -y +5" (H.-..) + (yn -3y,) 
ik ijk ik 


or 
SSr = SS4 + SSz inA + SS yes 
where SS, is the SS between the A-levels, SSz ;,, 4 is the SS between the B-levels 


within the A-levels and SS,,, is the SS within the classes (B-levels). 
The degrees of freedom of these SS are: 


ss df 
SSr N-1 
SS, a-l 
SSB in A B-a 
SSyes N-B. 


The SS may also be written in the form 


2 2 
SS; = > Yih 2 7 Sse | ne 5", 
Y~ y? Y? 
SSp in A= » = - ye Ne? SSres= Dik =i a 
The expectations of the MS are given in Table 5.19. 


ij 
Here and later we assume the side conditions (5.36). 
With the results of Chapter 4, we obtain the following theorem 
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Table 5.19 Analysis of variance table of the two-way nested classification for model I. 


Source of 
variation SS df MS E(MS) 

y? y? SSA 2 2 
Between SS,g=)  —-—> a-1 oa o +—— )_N,.G; 

i H -1 -1 t 
A-levels Ng oN ss - 
Y 7 y2 SSB in A 2 1 2 
B-levels dy i. 
within 
A-levels , 
Within Sres = a Jik- af N-B. SSres o 
B-levels eh “Ny N-B. 
(residual) ‘ ne 
Y T 

Total SS; = rer Soe N-1 em 


Theorem 5.11 MS, +MSz i, and MS,,, in Table 5.19 are independent of 
each other distributed as CS(a-1,4,), CS(B.-a,4,) and CQ(N-B.), respec- 
tively, where 


f= SE (7) (Bo—Bs)E(), Ap = SEY) (Bi -Ba)E(Y). 


Here B, is the direct sum of B. matrices Cj of order nj: 
By, = Cy: -BCap,; 
the elements of Cj are all equal to ny 1, By is the direct sum of a matrices G; of 
order N;: 
By = G{®:--OGu; 
the elements of G; are all equal to N,~!. Bs is a matrix of order N; all of its ele- 
ments are equal to N7!. 


Proof: We only have to show that the quadratic forms SS4, SSz in 4 and SS,es 
fulfil the assumptions of eee 4.6. We have 


So yie=¥7Y and a Y'BY 


i,j,k 


and further 
y? 
Qo 


By is the direct sum of a matrices of order N;. with the elements N;-'. Finally 


y2 
ee Y"BsY. 
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We have 
rk(B,) =B, rk(Bo) =a, rk(B3) = 1. 
Further B,, Bz and B3 are idempotent (Condition 1 of Theorem 4.6). In 
SS, =Y7 (By-Ba)Y, SSpina =Y? (Bi—Bo) Ys SSrep=Y" (Iv -Bi)Y, 
the matrices of the SS have the ranks 
rk(By -B3) = a-1, rk(B, - By) = B -a, rk(In-B,)=N-B.. 


Here Jy — B, + B, — By + By —- B3 = Iy — B3 is the matrix of the quadratic form SS 
of rank N -1. By this two conditions of Theorem 4.6 are fulfilled, and Theorem 
5.11 is proven. 


Example 5.15 For the data of Example 5.13, we get 


eS (14,12,15,18,12,14,6,5,10,7,8,12)”, 
100000000000 


0 00000000 


Wl wlrR wlrR 
fo) 
fo) 
jo) 
jo) 
fo) 
fo) 
fo) 
jo) 


Whe Whe Wle 
Wl wlrR wlrR 


Nl Nir 


jo) 
fo) 
jo) 
(jo) 
io) 
jo) 


By = 


fo) 
io) 
fo) 
(o) 
jo) 
jo) 


Oo BIR Ble BIR BIB 
Co FIP AIR BIE AIP 
oO FIP AIRE BIE AIP 
oO FIP BIE BIR AIR 


0000000000 


= rear aay sty 
= D3 t3305 12207 144 


N 
= 
XN 
to 
oe 
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where @ is again the symbol of a direct sum. We have rk(B,) = 5 because each 
summand has rank 1. Further B, is the direct sum of the matrix of order 4 with 


elements ! and the matrix of order 8 with elements is where we have 


4 
rk(By) =2.B3 is the matrix of order 12 of rank 1 with elements i From this 


we obtain (matrices as tables) 


alr 
an 
i) 


WO} COIN] DD) oT 8] WS] MSO] rR 


= 
oO 


h 
po 


i) 


co|vo 
| 


WO] CO] NI] BD} G1) BY] GC} KO] Fe 


= 
oO 


b 
H 


= 
i) 
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a re | 
3 3 3 i, vi 
I,-B, =04 BS eee S = 
3 3 3 | 
je coe “On 3D 
~3°°3~«*=8 
aa Ges Sa 
4 4 4 4 
di, S37, ety. et 1 1 
4 4 AOR a) 
® ® 
‘ae ee aes | i. °c 
4° 4 4 4 2 2 
i oh hh, a5 
a a re 


The reader may check as an exercise that B.-B3,B)-B, and I,,-B, are 
idempotent and that (By - Bs) (By - Bz) = (By -B3) (In -B,) = Oj12,12. 
From Theorem 5.11 it follows that with 1, and A, defined in Theorem 5.11 


MS 
Fa= ac is distributed asF(a—1, N-B_,Aq) 
and 
MS inA. : : 
Fz = ——2™4 is distributed asF(B. -a,N-B.,A,). 
MS yes 
If (5.36) is valid, F, can be used to test the hypothesis Ho, : a) = --- =a, because 


under this hypothesis 1, equals 0 (applying ees Nai = 0). Analogously Fz can 

be used to test the hypothesis Hog : bj = --- = Din, for all i, because then (applying 
bi 

vb =0) Ap equals zero. Hog is also testable, if (5.36) is not true. 


Example 5.16 We calculate the analysis of variance table for Example 5.13. 
We have 


Yu1.=14, Yio, = 45, Y,..=959, 
Yo1,=26, Yo2,=28, Yo3 =20, Yo..=74, 
Y. =133. 


Further it is 


Y;; 2 
S 2a-2 ) Je 
Vijk = 1647, ais = 1605, 
ij.k ij U] 
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Yy,.? Y ? 
= 1545.75; —~— = 1474.09, 
ase v7 


i. 


Table 5.20 contains the SS, df, MS and the F-ratios. The 0.95-quantiles (for 


a=0.05) are F(1.7|0.95) = 5.59 and F(3.7|0.95) =4.35. Ho, is rejected, but 
not the hypothesis Hop. 


Hints for Programs 
In SPSS a nested classifications can be analysed only if we change the syntax for 
the DESIGN command. After 
Analyze 
General Linear Model 
Univariate 
both factors must be put on ‘main effects’. Under ‘Model’ we choose in ‘Sum of 
Squares’ ‘Type 1’. Back in the main menu after pressing ‘Paste’, you can change 
the syntax for the DESIGN command as shown below: 


UNIANOVA 

yBYab 
/METHOD=SSTYPE (1) 

/ INTERCEPT=INCLUDE 
/CRITERIA=ALPHA(.05) 
/DEIGN=a b(a) . 


We now show how for the nested classification the minimal experimental size 
can be determined. We choose for testing the effects of A: 


>size.anova (model="a>b",hypothesis="a",a=6,b=4, 
talpha=0.05,beta=0.1,delta=1,cases="minimin") 

n 

4 
>size.anova (model="a>b",hypothesis="a",a=6,b=4, 
talpha=0.05,beta=0.1,delta=1,cases="maximin") 

n 

9 


Table 5.20 Analysis of variance table of Example 5.16. 


Source of variation SS df MS F 
Between A 80.66 1 80.66 13.44 
Between B within A 50.25 3 16.75 2.79 
Residual 42 7 6.00 


Total 172.91 11 
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We have to choose between 4 and 9 observations per level of factor B. For test- 
ing the effects of the factor B, we use 


>size.anova(model="a>b",hypothesis="b",a=6,b=4, 
talpha=0.05,beta=0.2,delta=1,cases="minimin") 

n 

5 
>size.anova(model="a>b",hypothesis="b",a=6,b=4, 
talpha=0.05,beta=0.2,delta=1,cases="maximin") 

n 

41 


5.4 Three-Way Classification 


The principle underlying the two-way ANOVA (two-way classification) is also 
useful if more than two factors occur in an experiment. In this section we only 
give a short overview of the cases with three factors without proving all state- 
ments because the principles of proving are similar to those in the case with two 
factors. Further statements valid for all cases proven in Chapter 4 and at the 
beginning of this chapter have been proven. 

We consider the case with three factors because it often occurs in applica- 
tions, which can be handled with a justifiable number of pages, and last but 
not least because besides the cross-classification and the nested classification 
a mixed classification occurs. At this point some remarks about the numerical 
analysis of experiments using ANOVA must be made. Certainly a general com- 
puter program for arbitrary classifications and numbers of factors following the 
theory of Chapter 4 with unequal class numbers can be elaborated. However 
such a program even with modern computers is not easy to apply because 
the matrices X7X easily obtain several ten thousands of rows. Therefore we give 
for some special cases of the three-way ANOVA numerical solutions for which 
easy-to-use programs can be applied (in SAS, SPSS, R). 

Problems with more than three factors are described in Method 3/51/0001 in 
Rasch et al. (2008). 


5.4.1 Complete Cross-Classification (A x B x C) 


We assume that the observations of an experiment are influenced by three fac- 
tors A, B, C with a, band c levels Aj, ..., Az, By, ..., By and Cj, ..., C,, respectively. 
For each possible combination (A,, B;, C,), let 1 > 1 observations yj (J = 1,...,7) 
be present. Each combination (A; B;, C,) (i=1,..,a;7=1,....b; k=1,...,c) of 
factor levels is called a class and characterised by (i,j,k). The expectation in 
the population associated with the class (i,j,k) is nix. 
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Analogously to Definition 5.2 we define the following: 


ijk 
ij. = dase the expectation of the ith level of factor A 


Nik 
= tk ” the expectation the jth level of factor B 
and 


oi fik 


1) .k = —qp — the expectation of the kth level of factor C 


je 


The total expectation is 


De jk ik 


ho oo abe 


The main effects of the factors A, B and C are defined by 
ai=7j.-K b=; -M and cea 4-H. 

Assuming that the experiment is performed at a particular level C;, of the 
factors C, we have a two-way classification with the factors A and B, and 
the conditional interactions between the levels of the factors A and B for fixed 
k are given by 

Nik Nik 1 jk +1 Uk (5.38) 

The interactions (a, b), between the ith A-level and the jth B-level are the 

means over all C-levels of the terms in (5.38), that is, (a, b),; is defined as 
(a,b), = iy. -i..-71j, + H- (5.39) 


The interactions between A-levels and C-levels (a, c);, and between B-levels and 
C-levels (b,c); are defined by 


(4,0) ie = Mik -M..- ak +H (5.40) 
and 

(B,C) i = je -11j.- 71 ake + (5.41) 
respectively. 


The difference between the conditional interactions between the levels of two 
of the three factors for the given level of the third factor and the (unconditional) 
interaction of these two factors depends only on the indices of the levels of the 
factors and not on the fact for which the interaction of two factors is calculated. 
We call it the second-order interaction (a, b,c); (between the levels of three 
factors). Without loss of generality we write 
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(4,D,C) ie = Mie — My. Mh je +i. + Dj +k HH (5.42) 


The interactions defined by (5.39) until (5.41) between the levels of two fac- 
tors are called first-order interactions. From the definition of the main effect and 
(5.39) until (5.41), we write for nix 


Nijk =H + aj + bj + CK + (a,b); + (4,C) x + (B C) ik + (4,B,¢) i: 


Under the definitions above the side conditions for all values of the indices not 
occurring in the summation at any time are 


So ai = S05 = yee = So (ab); 7 V3 @); . Yo @ix = Yo @it 
i j k i K 


J L 


= 3 (D,c) ix = » (b, C)ik = S- (a,b, ) i ? S- (4,B,¢) iz = ys (a,b, ) i = 0. 
k k 


J i J 
(5.43) 


The n observations y;,; in each class are assumed to be independent from each 
other N(0, o”)-distributed. The variable (called error term) ejxi is the difference 
between yj; and the expectation nj, of the class, that is, we have 


Vigkt = Nijk + Eijk 
or 
Viger =H + A; + Dj + CK + (a,b); + (a,c), + (B, C) ix + (4,b,C) ix. +@yx. (5.44) 


By the least squares method, we obtain under (5.43) the following estimators: 


1 
y = — a for 
Vis abc v 


as well as 
Cn i 
bj =I 7M ccs 
Ck =I KI 


(ac) 
(abi 


abe), = Vigk. Vig. Vik. I jk tI AI AI KAI, 
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2 
We may split SS_total = ae es (vin -y By into eight components: three cor- 


responding with the main effects, three with the first-order interactions, one 
with the second-order interaction and one with the error term or the residual. 
The corresponding SS are shown in the ANOVA table (Tables 5.21 and 5.22). In 
these tables N is again the total number of observations, N = abcn. 

The following hypotheses can be tested under (5.44) (Ho, is one of the hypoth- 
eses Ho4, ..., Hoagcs SS, is the corresponding SS): 


Hoa {aj =0 (for all i), 


Hog < bj=0 ( for all j), 
Hoc:cx=0 (for all k), 


Hoag: (a,b), =0 (for all i,j), 
Hoac :(4,c),=0 (for all i,k), 
Hogc :(8,c),=0 (for all j,k), 


Hoasc :(4,b,¢),=0 (for all i,j,k, if n>1). 


1 1 
Under the hypothesis Hp, 5 SSx and 7 SSres are independent of each other 
oO 


oO 
with the df given in the ANOVA table centrally y?-distributed. Therefore 
the test statistics given in the column F of the ANOVA table are with 
the corresponding degrees of freedom centrally F-distributed. For n=1 
all hypotheses except Hoasc can be tested under the assumption 


1 1 1 
(4,0, c) i. = O for all i, j, k because then 5 SSasc = ZSSres and 5255s 


o o 
under Ho,(«=A,B,C etc.) are independent of each other y 2 distributed. 
The test statistic F,, is given by 


(a-1)(b-1)(c-1) SS, 


EF = 
‘ af 'S S res 


The calculation of a three-way ANOVA can be done in such a way as if we 
have three two-way ANOVA. We demonstrate this by the following example. 


Example 5.17 The observations of Example 5.9 can be considered as those of 
a three-way ANOVA with single class numbers (n = 1) if as factors we use the 
forage crop (A), the kind of storage (B — barn or refrigerator) and the packaging 
material (C — glass or sack) (Table 5.22). We have a=b=c=2 and n=1. The 
observations in Table 5.22 can be arranged in three tables of a two-way classi- 
fication where the new ‘observations’ are the sums over the third factor of the 
original observations in the classes defined by the levels of the two factors 
selected (Tables 5.23, 5.24 and 5.25). Table 5.26 is the ANOVA table of the 
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Table 5.21 Analysis of variance table of a three-way cross-classification with equal subclass numbers 
(model i). 


Source of variation SS df 
Between A-levels ab Y?. - Yt. 3 a-1 
Between B-levels SSp= as y2 - ih b-1 
B acn jo Nee 
Between C-levels Sc= Ti —yY, = oY, cot 
i 
Interaction A x B SSap = a ar Ve (a-1)(b-1) 
1 2 
pres 
1 1 
Interaction A x C SSac = i ee = rae te (a-1)(c-1) 
1 ane & 
= es + 
1 1 
Interaction B x C SSzc = ak 7 = _ (b-1)(c-1) 
~ abn ae “kt 
Interaction A x B x C ae ares A (4a-1)(6-1)(c-1) 
-SSap-SSac —SSpc - SSres 
1 
heat, z 2 2 = 
Within the SSres = einmiH - oye ie abc(n-1) 
classes (residual) 
y 
Total SSr = ar aR oan (N-1) 
MS E(MS) F 
S b z= 
Mgjoo ota Om 2 abc(n-1) SS4 
a-1l a-l U a-l SS yes 
SSp pees : abc(n-1) SS 
MS; = —— o +——_ ) b EME. 
aie b-1- b-1 SSyes 
MSc = SSc os abn 2 abc(n-1) SSc 
c-1 k c-1 SSyes 
SSap 2 cn abc(n-1) SSap 
MS + (a,b); SEEMED ods 
48 (a—1)(b-1) a-1 pa) : a-1)(b-1) SSyes 
SS 4c bn abc(n-1) SSac 
MS 4c = 2 : SAUL 2 
Ae (a= Dea) ee aad pay Cit a—1)(e~1) SSyes 
SSzc 2 an abc(n-1) SSgc 
MSc = * (b,c) _abe(n=1)_ 
oS (Beat) b-1 ips - b-1)(c-1) SSyes 
SS Bc 2 n abc(n-1) SSapc 
MS + (a,b,c); 
MG be) a-1)(b-1)(c np  G@=1)(b~1)(e~1) SS 
2. 
MS yes s SSres 2 
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Table 5.22 Three-way classification of the observations of Table 5.14 with factors kind of 
storage, packaging material and forage crop. 


Forage crop 


Kind of storage Packaging material Green rye Lucerne 

Refrigerator Glass 8.39 9.44 
Sack 5.42 5.56 

Barn Glass 11.58 12.21 
Sack 9.53 10.39 


Table 5.23 Two-way classification of the observations of Table 5.14 with factors kind of 
storage and forage crop (Yj). 


Forage crop 


Green rye Lucerne Yi; y? 
Kind of storage Refrigerator 13.81 15.00 28.81 830.0161 
Barn 21.11 22.60 43.71 1910.5641 
Ys 34.92 37.60 72.52 2740.5802 
Y? 1219.4064 1413.76 2633.1664. 


in 


Table 5.24 Two-way classification of the observations of Table 5.14 with factors packaging 
material and forage crop (Yj). 


Forage crop 


Green rye Lucerne Y.x v7; 
Packaging material Glass 19.97 21.65 41.62 1732.2244 
Sack 14.95 15.94 30.90 954.8100 
Vy 34.92 37.60 72.52 2687.0344 


Y? 1219.4064 1413.76 2633.1664. 
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Table 5.25 Two-way classification of the observations of Table 5.14 with factors kind of 


storage and packaging material (Yj). 


Packaging material 


Glass 
Kind of storage Refrigerator 17.83 
Barn 23.79 
Y, 41.62 
y2, 1732.2244 


Sack 


10.98 
19.92 
30.90 


954.8100 


Table 5.26 Analysis of variance table of Example 5.17. 


Source of variation 


Between kind of storage 

Between forage crops 

Between packaging material 

Interaction kind of storage x packaging material 
Interaction kind of storage x forage crops 
Interaction forage crops x packaging material 
Residual 

Total 


SS 


27.7513 
0.8978 
14.3648 
1.1100 
0.0112 
0.0578 
0.1625 
44.3554 


28.81 
43.71 
72.52 


2687.0344. 


a 
+ 


NFP oP RP BP RP RP eB 


MS 


27.7513 
0.8978 

14.3648 
1.11 
0.0112 
0.0578 
0.1625 

44.3554 


y2 


i 


830.0161 
1910.5641 
2740.5802 


example. The F-tests are done under the assumption that all second-order inter- 
actions vanish using the SS,,, defined above. Only between the kinds of storage 
significant differences (a = 0.05) could be found, that is, only the hypothesis H, 


is rejected. 


Hints for Programs 


The size of the experiment is again calculated with the help of OPDOE in R. 
The syntax is analogous to that of the one- and two-way ANOVA. We 
demonstrate the calculation for sizes needed for testing the null hypothesis 
for factor A and the interactions A xB for a balanced experiment with 


a=3,b=4andc=3: 


> size.anova (model="axbxc", hypothesis 
talpha=0.05,beta=0.1,delta=0.5,cases="minimin") 


n 
6 


"a" @a=3,b=4,c=3, 
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> size.anova (model="axbxc",hypothesis="a",a=3,b=,c=3, 
talpha=0.05,beta=0.1,delta=0.5,cases="maximin") 

n 

9 

> size.anova (model="axbxc", hypothesis="axb",a=3,b=4, 
+c=3, alpha=0.05,beta=0.1,delta=1,cases="minimin") 

n 

3 

> size.anova (model="axbxc",hypothesis="axb",a=3,b=4, 
+c=3, alpha=0.05,beta=0.1,delta=1,cases="maximin") 

n 

12 


5.4.2 Nested Classification (C<B<A) 


We speak about a three-way nested classification if factor C is subordinated to 
factor B (as described in Section 5.3.2) and factor B is subordinated to factor A, 
that is, if C<B<A. We assume as in Section 5.3.2 that the random variable y,4 
varies randomly with expected value nj (i= 1,....4;/=1,...,bi k= 1,...,.cj), that 
is, we assume 
Vijkt = Nijk + Cijkl (J =1,..., Nijk)» 
where ej; independent from each other are N(0, o”)-distributed. By 
ah & 
Dominik 
_ Gel j= k=l 
H=1... = N , 


Ci 


a bi 
we define the total mean of the experiment by N = pepe ee 
We generalise Definition 5.5 by 


Definition 5.6 The difference a; =7; . - 1 is called the effect of the ith level of 
A, the difference bj =i —i7; , is called the effect of the jth level of B within the 
ith level of A and the difference cj, =. —7;;, is called the effect of the kth level of 
C within the jth level of B and the ith level of A. 


Then the observations can be modelled by 
Vint = H+ Gj + Dy + Cin + Cixd- (5.45) 


There exist no interactions. We consider (5.45) with Ny. = >_ Myts 
Nj... = lik under the side conditions 
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a bj Ci 
SON... =S°Nj by = So nijnciin = 0. (5.46) 
i=l j=l k=1 
Minimising 
a bh Sj Nik 9 
(vinta - bij — Cyt.) , (5.47) 


i=1 f=1 k=1l=1 
under the side conditions (5.46), leads to the BLUE of the parameters as follows: 
M=I» 4=5;,-F 0 by= Vig. Ii... Cijke = ijk. Fy. 
Without proof we give a theorem about the decomposition of the SSjo4; = SS 


where the corresponding non-centrality parameters are calculated analo- 
gously to 


1 TyT T -yT 1 
d= Spx (xx X) X7—ln,w )XB 


in Section 5.1 by multiplying the quadratic form of the SS with the correspond- 
ing expectations. 


Theorem 5.12 Ina three-way nested classification, we have 


SSr =SS, + SSz in A + SSc in B (and A) + SSyes 


with N= ae oy. a ijk Ny. = ti Ni ee tik and 


y2 y2 y2 
= Pea Laat = basen. Porects 
SSr= > Jue 88a =) No oN? 
ijkl i 
y y? Yi i. 
SSB in a= os =, SSc in B= “- a 
» Ny. De Ni. » Nijk iy Ny. 
Y? 
2 ijk 
SSres= > Yin 
ijkl ipk UK 


1 1 

57 5SA up to 52 58c inp are with B ,, = by,C.., = in Cik pairwise independently 

1 
CS(a-1,Aq), CS(B.-a,Ay), CS(C..-B.,4-) distributed and —SS,s is 

oOo 
CS(N -C..)-distributed. The non-centrality parameters 1,, 4, and A, vanish 
under the null hypotheses Ho4: aj)=0 (i=1,...,4), Hop: bj =0 
G21 gate lewbi) toes epee: (t= 1.505 7 = ly bak = Lagey),'S0 that 
the result of Theorem 5.12 for constructing the F-statistics can be used. 
Table 5.27 shows the SS and MS for calculating the F-statistics. If Ho, is valid, 
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F, is F(a-1, N-C..)-distributed. If Hog is valid, then Fg is F(B-a, 
N-C..)-distributed, and if Hoc is valid, then Fc is F(C..-B., 
N-C..)-distributed. 
Hints for Programs 
For the analysis in SPSS analogously to Section 5.3.2, we have to change the syn- 
tax in the/DESIGN command. The minimal subclass numbers for the three 
tests of the main effects with OPDOE in R give the following results: 
> size.anova (model="a>b>c",hypothesis="a",a=2,b=2,c=3, 
talpha=0.01,beta=0.1,delta=0.5,cases="minimin") 
n 
21 
> size.anova (model="a>b>c",hypothesis="a",a=2,b=2,c=3, 
talpha=0.01,beta=0.1,delta=1,cases="minimin") 
n 
6 
> size.anova (model="a>b>c",hypothesis="b",a=2,b=2,c=3, 
talpha=0.01,beta=0.1,delta=1,cases="minimin") 
n 
7 
> Size.anova (model="a>b>c",hypothesis="c",a=2,b=2,c=3, 
talpha=0.01,beta=0.1,delta=1,cases="minimin") 
n 
10 
The maximin values are left for the reader as an exercise. 
Table 5.27 Analysis of variance table of a three-way nested classification for model i. 
Source of 
variation SS df MS E(MS) (under (5.46)) F 
yY y? SS gt n MS, 
ee, - spies cee N; =F 
Between A ». N, xa a-1 al ng 71 2i-1 Gj MS... A 
Y? y2 SSBina 5 1 2 MSz ina 
Between B ie it B-a HM" o+ Ny bi = F, 
au ie a, Nj i Ni... ‘ B.-a B. gas ag MS yes . 
y2 2 SScing 2 af MSc ing 
Between C \ pees i OLB oO + > «ik Cigk = Fc 
inBandA dois Nijk Dy Nj peek, cane : MSres 
y2 SSyes 0 


: ijk 
Residual eae = ee aE N-C.. NIG. 


y2 
Total eS jks Sin TW N-1 
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5.4.3 Mixed Classification 


In experiments with three or more factors under test besides a cross- 
classification or a nested classification, we often find a further type of classifica- 
tions, so-called mixed (partially nested) classifications. In the three-way 
ANOVA, two mixed classifications occur (Rasch, 1971). 


5.4.3.1 Cross-Classification between Two Factors Where One of Them Is 
Subordinated to a Third Factor ((B<A) x C) 

If in a balanced experiment a factor B is subordinated to a factor A and both are 
cross-classified with a factor C, then the corresponding model equation is 
given by 


Viagg =H + Gi + Dy + CK + (Asc), + (B.C) pry + Cyd 
ijkl ij (4,€) ix + ( Jik(i d (5.48) 


(G=1,..,.4 j=1...,b; k=1,..,G1=1,..,n), 


where pz is the general experimental mean, a; is the effect of the ith level of factor 
A, bj is the effect of the jth level of factor B within the ith level of factor A and the 
cx is the effect of the kth level of factor C. Further (a, c);, and (b,c); are the 
corresponding interaction effects and e,,; are the random error terms. 

Model equation (5.48) is considered under the side conditions for all indices 
not occurring in the summation 


Cc 


b c 
yai- Yby=Sree=Pady= Dla w= 2 (ore) = DB Ony =O 


=1 j= 


(5.49) 
and 
E (ej) =0, E (eyueijer) = bir 5p See SOs o” = var (jx) (5.50) 
(for all i, j, k, 2). 


The observations 


9ie (F=f alaska gels 1am) 


are allocated as shown in Table 5.28 (we restrict ourselves to the so-called bal- 
anced case where the number of B-levels is equal for all A-levels and the subclass 
numbers are equal). For the sum of squared deviations of the random variables 


Ya G=1, Gia 1..Bk=1,..,61=1,...n) 


from their arithmetic mean 


SSr = SS (vin-J... \ = So ¥ju- To, (N =abcn), 


ijkl ipkL 


Table 5.28 Observations of a mixed three-way classification b with c cross-classified, b in a 


nested ((B<A) xC). 


Levels of A 


A, 


Ay 


Levels of B 
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Levels of C 
Cc, CGC 
Vit = 1121 
Yiui2 = -Y1122 
Vilin Yi120 
Dizi (9/1221 
yi2i2——-Y1222 
Vi2in Y122n 
Vibi1 = Y1b21 
Yibi2— -Y1b22 
Vibin = Yib2n 
You = 2121 
Yoi12 = -Y2122 
Y2iin = Y212n 
Jo211—-/2221 
2212-2222 
JY221n =—-Y222n 
Jabi1—-Y/2b21 
Y2b12 -Y2b22 
JY2b1in =Y2b2n 


y 12cn 


Vibe 


JVibe2 
Vibcn 
Y21c1 


Y21¢2 


Y21cn 
J/22c1 


Y22c2 


y 22cn 


J2be1 


Y2bc2 


Y2ben 


(Continued) 
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Table 5.28 (Continued) 


Levels of A Levels of B Levels of C 
Ag Ba Yalll = Yai21 ire Yaicl 
Yaii2 Yar22 ve Valier 
Yalin Yal2n Yaicn 
Bar Ya211 = Yar21 a Ya2c1 
Ya212 = Yar22 ee Ya2rc2 
Ya2I1n Ya22n Ya2cn 
Bap Yabi1 = Yab21 a Yabe1 
Yab12 = Vab22 ao Yabe2 
Yabin Yab2n Yabcn 

we have 


SS7 = SS, + SSB in A+ 88c+SS8S4x0+4+S8BxC in A+ SSres, 


where 


1 Y’ 
SS = y2 are 
4~ ben Ss i ON 


i=1 


are the SS between the levels of A, 


1 a b 1 a 
SSz nam UY, ieee 


i=1 j=l 
the SS between the levels of B within the levels of A, 


1 c y2 
SSc= y? za 
©~ abn 2 kOUWNT 
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the SS between the levels of C, 


the SS for the interactions A x C, 


new Sy ee ike =yyY 


i=1j=1lk=1 ore 1lj= 

a € a 
PS we 
i-k- i 

bane be i=1 


the SS for the interactions B x C within the levels of A and 


SSres= DV >SD>s ik 
ip kl Wet jal ee 
the SS within the classes. The N - 1 degrees of freedom of SS; corresponding 
with the components of SS; can be split into six components. These compo- 
nents are shown in Table 5.29; the third column of Table 5.29 contains the 
MS gained from the SS by division with the degrees of freedom. 

If hypotheses have to be tested about the constants in model equation (5.47), 
we additionally have to assume that e,,; are normally distributed. The hypoth- 
eses can then again be tested with of F-tests. The choice of the correct test sta- 
tistic for a particular hypothesis can easily be found heuristically. For this the 
expectation E(MS) of the MS must be known. The E(MS) can be found in 
the last column of Table 5.29. Representatively for the derivation of an E(MS) 
we show the approach for E(MS,). We have 


E(MSa) = Pa pF (SQu) = : Alee.)-Gr)]: 


Now we replace the y,,; by the right side of the model equation (5.48) and obtain 


b c G 
Yj... = benp + bena; + cn) “by + bn ex + bn “(a,c)iz 


D> (60+ OSS eH 


jalke=ll=l1 


and using (5.49) 


b 
.. = ben + bena; + 2 So ej 
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Table 5.29 Analysis of variance table for a balanced three-way mixed classification (B<A)xC for 


model | (n> 1). 
Source of variation 
Between the levels of A 


Between the levels of B 
within the levels of A 


Between the levels of C 


Interaction A x C 


Interaction B x C within the 
levels of A 


SS 


1 a 1 
ir fa 
S545 ben i= NO 


Bye 
Spin = cn i=124j=1 i benai=1 0 * 


So= “ain < a oe 
SSanc=7- Se Se a - Yh 
TET at. 
SSp.c in a= “yy adele = Boe 
“ne oe ia Es 


: 7 ~ 2 
Residual SSres= oi Oe 1Zul= eGR “yy. 1Luj= mee rad ijk- 
a a Cc A 2 
Total $5552 SS ty 
df MS E(MS) under (5.49) 
a-1 MS, _ 58a ot PH 2 
— a-1 i=1¢ 
SSz ina Die OE an 
(6-1) MSs ina = Gb 1) “> (b poi jai 
SS, > abn 2 
c-l MSc= oO + pack 
(a-1)(c-1) MSa,c= oo bxe oan ye (a,c 
a-1)(c-1) (a-1)(c-1) iz Look= 1 
SS xCi 2 2, 
(b-1)(c-1) MS» ..c ina = Z05-1)(c~1) e: a as 
N-abc MS yes SSres o 
bc(n-1) 
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Analogously we receive under (5.49) 


Y... = abcnp+ S Cid. 
ij, kl 


Now we obtain for E(Y;..) the equation 

E(¥?.) =P Cn yw +b?c'n’a? + 2b°c°n’ pa; + beno® 
and for E(Y*.) the equation 

E(¥?.) =N?y? + No’. 


With these two equations we get 
E(MS,) = BoA 5 92 +e 
ar] i=1 


MS, 
MS es. 
which under Ho, with a — 1 and N -abc degrees of freedom is F-distributed. If 
the null hypothesis is correct, numerator and denominator of F, (from 
Table 5.29) have the same expectation. In general there is a ratio of two MS 
of a particular null hypothesis with the corresponding degrees of freedom cen- 
trally F-distributed, if the numerator and the denominator in case that the 
hypothesis is valid have the same expectation. This equality is however not suf- 
ficient if unequal subclass numbers occur; for instance, it is not sufficient if the 
MS are not independent from each other. In this case we obtain in the way 
shown above only a test statistic that is approximately F-distributed. We will in 
the following not differentiate between exact and approximately F-distributed test 
statistics. From Table 5.29 we see that in our model, the hypothesis over all effects 
(aj, Diy, .-., (a, B, C) jx) can be tested by using the ratios of the corresponding MS and 
MS, as test statistic. 

As an example we consider again testing pig fattening for male and female 
(factor C) offspring of sows (factor B) nested in boars (factor A). The observed 
character is the number of fattening days an animal needed to grow up from 
40 to 110 kg (compare Example 5.6). 

We again give an example for the calculation of the experimental size using 
the symbolism of OPDOE in R as in the other sections: 


The hypothesis Ho, : a; = 0 can be tested by the help of the statistic F4 = 


>size.anova (model="(axb)>c", hypothesis="a",a=6, b=5, 
+c=4, alpha=0.05, beta=0.1, delta=0.5, case="minimin") 
n 
3 
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5.4.3.2 Cross-Classification of Two Factors in Which a Third Factor Is 

Nested (C<(A xB)) 

If two cross-classified factors (A x B) are super-ordered to a third factor (C), we 
have another mixed classification. The model equation for the random obser- 
vations in a balanced design is given by 


Yijed =H + Gi + bj + CijK + (a,b); +eyu, (i=1,...,45 j=1,...,b; k=1,...,6 [=1,...,n). 
(5.51) 


This is again the situation of model I, where the error terms ej; may fulfil con- 
dition (5.50). 

Analogously to (5.48) we assume that for all values of the indices not occur- 
ring in the summation, we have the side conditions 


a 


a b c b 
Si ai= > b= > ce = 5 (GD)y= >, (4b) yg =0. (5.52) 
i=l j=l k=1 j=1 


i=l j= 


The total sum of squared deviations can be split into components 
SSr = SS4 + SSz + SScinaB + S84 xB + SS yes 


with 


the SS between the C-Levels within the A x B combinations, 


ab y2 ay? b Y* y? 
i. die 


SSaxe= >>) = < ben ar ee 


i=1 j=l 
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the SS for the interactions between factor A and factor B and 


a b cn a Dye c ye 
Se ee 


i=l j=l k=1l=1 i=1 j=l k=1 


The expectations of the MS in this model are shown in Table 5.31, and the 
hypotheses 


Apa: 4:=0, Hog: b)=0, Hoc: cyx=0, Hoag: (4b), =0 


can be tested by using the corresponding F-statistic as the ratios of MS,, MSz, 
MSc and MS, 2, respectively (as numerator) and MS,,, (as denominator). 

As an example consider the mast performance of offspring of beef cattle (fac- 
tor C) of different genotypes (factor A) in several years (factor B). If each sire 
occurs just once, then the structure of Table 5.30 is given. 


Table 5.30 Observations of a balanced three-way mixed classification, a with b cross- 
classified and c nested in the a x b-combinations. 


Levels of B 
Levels B, Ba B 
of A Levels of C Levels of C Levels of C 
Cin Cio. Cue Car Ciz2 «Cire Cip1 Cir + Cire 
A, Viti i121 +++ Vitel Ji211 9/1221 ++ V1 2e1 Vibi1 ib21 +++ Vibe 
Yi1i2 1122 ++» Yi102 Y1212 1222 ++» Y122 Yibi2 V1b22 ++» Vibe2 
Vilin Vi12n +++ Vilen JVi2in Yi22n ++ Vi2en Vibin Vib2n +++ Yiben 
Can Cri .. Carre Cr21 Cax2_ «Care Cop1 Cor «+ Crbe 
Ag Joi. 32121 ++ Y21c1 3/2211 2221 +++ Y22c1 J2b11 2621 +++ Y2be1 
Y2112 $2122 ++» Y21c2 32212 2222 «++» Y22c2 Y2b12 V2b22 ++» V2be2 
Y2lin Y212n +++ Viren JY221n = -Y222n +++ Y22en J2bin = Y2b2n +++ Y2ben 
Cau Cu2 ee Catc Car Ca22 see Care Capi Cap2 see Cabe 
Ag Yall Yai21 +++ Yaicr Ya211 Ya221 +++ Yar Vabi1 Yab2 +++ Yabci 
Yai12 Ya122 +++ Vaice2 Ya212 Ya222 +++ Va2r2 Yab12 Yab22 ++» Vabe2 
Valin Yal2n +++ Valen Ya2in Ya22n ++» Ya2rcn Yabin Yab2n +++ Vaben 
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Table 5.31 Analysis of variance table and expectations of the MS for model | of a balanced 
three-way analysis of variance A with B cross-classified, C in the A x B-combinations nested. 


Source of variation SS 
¥2). -¥? 
Between A-levels SS, = ye 5 is a 
=1 ben 
Y? 2 
Between B-levels SSp = Se : a a 
i=l acn 


Between C-levels in SSo in Se Sap ae as -\~ ome i 


A x B combinations 


. YG eV? yy? 
Interaction A x B SSaxe= >, , ee = a oe a + Re 
y2 
é “ a b c n 2 a b c ijk- 
Residual ec rs eee n 
~ Y’ 
Total SSr= ae ue 1Zul= 1-H DN 
DF MS E(MS) under (5.52) 
a-1 MS, = 294 sg a 
a-1 a-1““i-=1"' 
SS. , gen 
b-1 MS; = —— 
BY b-1 pala 
SScinaB Ht 
ab(c-1) MScina = Tr(c—1) a a ab(c-1) iia el 1LZuj= Dae 1 Ci 
SS, x B ee ee a. 2 
DOD MS Ga Mt ap lh 
SSyes 2 
N-ab 7 
ae MS res N-abc ‘i 
N-1 


Hints for Programs 
Again we determine the minimal experimental size by OPDOE of R using the 
procedure as in sections above: 


> size.anova (model="(axb) >c", hypothesis="b",a=6, b=5, 
+c=4, alpha=0.05, beta=0.1, delta=0.5, case="minimin") 
n 

3 

> size.anova (model="(axb) >c", hypothesis="b",a=6, b=5, 
+c=4,+ alpha=0.05, beta=0.1, delta=0.5, case="maximin") 
n 

Ge. 
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5.5 Exercises 

5.1 Prove (a) to (d) of Lemma 5.4. 

5.2 Analyse the data of Table 5.14 by SPSS or R, that is, compute the analysis 
of variance table and all F-values. 

5.3 Analyse the data of Table 5.18 by SPSS or R, that is, compute the analysis 
of variance table and all F-values. 

5.4 Prove that X(X7X) X7X =X. 

5.5 Show that in Example 5.15 the differences By -B3,B,-—By and I,,- By, are 
idempotent and that (B2-B3)(B; - Bz) = (Bo -B3) (J, -B,) =0. 

5.6 Install and load in R the program package OPDOE. 

5.7 Compute with OPDOE of R for a = 0.025, 6 =0.1 and 6/o = 1 maximin 
and minimin of the one-way analysis of variance for a = 6. 

5.8 Compute with OPDOE of R for a = 0.05, # = 0.1 and 6/o = 1 maximin and 
minimin of the two-way cross-classification for testing factor A for a =6 
and b=4. 

5.9 Compute with OPDOE of R for a = 0.05, # = 0.1 and 6/o = 1 maximin and 
minimin of the two-way nested classification for testing the factors A and 
B for a=6 and b=4. 

5.10 Compute with OPDOE of R for a = 0.05, 6 = 0.1 and 6/o = 1 maximin and 
minimin of the two-way cross-classification for testing the interactions 
Ax B for a=6 and b=4. 

5.11 Compute with OPDOE of R for a = 0.05, 6 = 0.1 and 6/o = 0.5 maximin 
and minimin of the three-way cross-classification for testing factor A for 
a=6,b=5andc=4., 
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Analysis of Variance: Estimation of Variance Components 
(Model II of the Analysis of Variance) 


6.1 Introduction: Linear Models with Random Effects 


In this chapter models of the analysis of variance (ANOVA) where all factors are 
random are considered; we call the model in this case model II. Our aim in such 
models is not only as in Chapter 5 the testing of particular hypotheses but also 
the methods of estimating the components of variance. For the latter we first of 
all consider the best elaborated case of the one-way analysis of variance. We 
again use the notation of Section 5.1 and consider formally the same models 
as in Chapter 5. The difference between Chapters 5 and 6 is that the effects 
of model II are random. We assume that, for instance, for a factor A, say, exactly 
a levels are randomly selected from a universe P, of (infinite) levels of the factor 
A so that a, ... ,@,; the effects of these levels are random variables. 

The terms main effect and interaction effect are defined analogously as in 
Chapter 5, but these effects are now random variables and not parameters that 
could be estimated. 

Models, in which some effects are fixed and other are random, are discussed 
in Chapter 7. In Chapter 6 some terms defined in Chapter 5 are used, without 
defining them once more. 


Definition 6.1 Let Y= (y,..., yn) be an N-dimensional random vector and 
B= (uf, .-.,B,)" a vector, of elements that except for are random variables. 
Further X as in (5.1) is a N x (kK + 1) matrix of rank p < k + 1. The vector e is also 
an N-dimensional random vector of error terms. Then we call 

Y=XPr+e (6.1) 


a model II of the ANOVA if 


var(e) =o7ly, cov(B,e) =Ox41,n and E(e)=Oy, E(B) = (; ) 
k 
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We write (6.1) in the form 
Y=puly+Zyte (6.2) 


where Z is the columns two up to (k + 1)- of X and y contains the second up to 
the (k + 1)-th element of 8. Then we have E(Y) = p1y. 
If in B of (6.1) effects of r factors and factor combinations occur, we may write 


yl= Geta ay) and Z = (Z41,Z425-+»Zar). 


In a two-way cross-classification with interactions and the factors A and B, we 
have, for instance, r=3,A=A,,B =A, and AB =A3. In general, we have 


Y=ply + S247, +e. (6.3) 


i=l 


Definition 6.2 Equation (6.3) under the side conditions of Definition 6.1 and 
the additional assumption that all elements of y,, are uncorrelated and have the 
same variance o? so that cov (7 AY 4) = Og,a; for all i, ji #j) var(74,) = 07a, if a; 
is the number of levels of the factors, A; is called a special model II of the analysis 
of variance; o? and o* are called components of variance or variance 
components. 
From Definition 6.2 we get 
r 
var(Y) = eZee; +o'ly. (6.4) 


t=1 


Theorem 6.1 If Yisan N-dimensional random variable so that (6.3) is a model II 
of the ANOVA following Definition 6.2 for the quadratic form YAY with an 
NxN-matrix A, we have 


E(YTAY) =I. Aly + Sv o?te (AZs,Z1,) +o°tr(A). (6.5) 
i=1 


Proof: We see that 
E(Y'AY) =tr[A var(Y)] +£(¥")AE(Y) 


and because E(Y) = «1, and (6.4) now follows (6.5). 

Theorem 6.1 allows us to calculate the expectations of the mean squares of an 
ANOVA based on model I, which is of importance to one of the methods for 
estimating the components o? and o”. 

Henderson (1953), Rao (1970, 1971a), Hartley and Rao (1967), Harville 
(1977), Drygas (1980) and Searle et al. (1992) developed methods for the esti- 
mation of variance components. A part of these methods can also be used for 
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mixed models (Chapter 7). Henderson’s ANOVA method works as follows: at 
first ANOVA for the corresponding model I is calculated including the ANOVA 
table excluding the calculation of the E(MS). Then using (6.5) the E(MS) is 
calculated for model II. These are functions of the variance components o7. 
The E(MS) then equalised with the observed MS and the resulting equations 
are solved for o7. The solutions are used as estimates 67 = s? of 07. 

In this way differences occur between the MS, and these can be negative; 
consequently, negative estimates of the variance components can result 
as a consequence of this method. That means that the method gives no 
estimators (or estimates) as defined in Definition 2.1. If the value of a variance 
component is small (near 0), negative estimates may often occur. Negative 
estimates may either mean that the estimated component is very small or they 
may be a signal of an inappropriately chosen model, for instance, if effects are 
nonadditive. The interpretation of negative estimates is discussed by Verdoo- 
ren (1982). 

In the following sections the method of Henderson is applied for several clas- 
sifications. The estimator of a component is reached by replacing all observed 
values in the corresponding equation of the estimate by the corresponding ran- 
dom variables. 

Simultaneously with the estimation tests of hypotheses about the variance, 
components are described. 

Besides the method of Henderson, three further methods are mainly in use. 
For normally distributed Y we can use the maximum likelihood method or a 
special version of it, the restricted maximum likelihood method (REML). Fur- 
ther we have the MINQUE method (Figure 6.1), minimising a matrix norm. We 
propose always to use REML. 

Each of these four methods can be performed by SPSS via (see next page) 


Figure 6.1 Methods of variance 

component estimation available 

in SPSS. Source: Reproduced with 
permission of IBM. 


tf Variance Components: Options x 


© Maximum likelihood 


© Restricted maximum likelihood 


r Random Effect Priors r 
@ Uniform © Zero | 
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Analyze 
General Linear Model 
Variance Components 
Options 
Before we discuss special cases, some statements about the approximate distri- 


bution of linear combinations of y-distributed random variables must be made. 

nyu; 
2 
L 


The random variables u,, ... , 4, may be independent of each other, and 


may be CS (n;)-distributed. The variance of the linear combination 


K k 
z=) ca; (« sothat S “cio; -0) 
i i=l 


is then 


We divide z by the weighted variance 0%, = ~*_,cjo?,0%, >0, and we will 


nz 
approximate the distribution of —— for certain n by a y’-distribution, which 
Ow 


z This we achieve by putting (following 


has the same variance as 


Satterthwaite, 1946) 
OW 


n= a: 


k 0; 
ini 


nj 
by Theorem 6.2 below. 


Theorem 6.2 If the random variables ae are independent of each other 
0; 
CS(n,)-distributed, then the random variable = with 
Ow 


k k ot 
Z= y Cj, Ov = y co; >0 and n= ——“_ 
i= Fi 


has the same variance as a CS(n)-distributed variable. 


We already used this theorem for Welch test in Chapter 3. 

Following Theorem 6.2 we can approximate a linear combination of inde- 
pendently y*-distributed random variable by a y?-distribution with appropriate 
chosen degrees of freedom. For instance, we see in Theorem 6.2 that 
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NZ NZ 


a’ a@ 
C8(m,1-5) CS(m5 
an approximate confidence interval with coefficient 1 - a for o%, if oj, z and 
are chosen as in Theorem 6.2. Welch (1956) showed that for 1>0 a better 
approximate confidence interval as (6.6) can be found as 


(6.6) 
) 


NZ NZ 


BGs” 6.7 
A1-s’ As (6.7) 


where 


2 
Ay = CS(s7) ~ 3 (22]-¢ +1) a pane 
kG Fi 


For some cases Graybill and Wang (1980) found a further improvement. 


6.2 One-Way Classification 


We consider Equation (6.3) for the case r = 1 and put 74; = (qj, ... ,Q,)- and 


o7 = 0%. Then (6.3) can be written in the form 


Yj =Hr ate; (i=1,...a;7=1,..5Mi). (6.8) 

The side conditions of Definition 6.1 are var(ej) = 0°, var(a;) = 02 and that a; 

are independent of each other and e; are independent of each other and of a. 
From Example 5.1 with X in (6.8) and (6.4) it follows that 


V = var(Y) 


ll 
is 


(Linn + In,07). (6.9) 


For the case a = 3, nN, = No = N3 = 2, the direct sum in (6.9) has the form 


V= (12,207 + ho’) ® (12,207 + ho’) ® (12,207 + ho’ 


C+ o 0 0 0 0 
o o +o 0 0 0 0 
0 0 +0 & 0 0 
7 0 0 o oto 0 0 
0 0 0 0 o +07 o 
0 0 0 0 ec oto 
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Lemma 6.1 If the square matrix V of order 1 has the form 
Vi (a,b) = bly + ans 


then its determinant is if b 4 — na and b 40 
|V,,(a,b)| =b"-*(b+na) (6.10) 


and further its inverse is 


fy 1 a a 1 
ad (5. b(+ =) ~ -b( +na) Dan t pie 
Proof (Rasch and Herrendorfer, 1986): We subtract the last column of V,,(a, b) 
from all the other columns and add the n — 1 first rows of the matrix generated 
in this way to the last row. Then the determinant of the resulting matrix 
equals (6.10). 
If the inverse of V,,(a, b) has the form dJ,, + cl,,,,, then V,,(a, b)(dl, + Clu, n) = 


1 a 
I, and this leads to d= eo “TO aay 
Lemma 6.2 The eigenvalues of 
V= e (ewes + 1y,0°) 
are with N = )~¥_ Mj. 
nor +o” (k=1,...,4), 
-{% (k=a+1,...,N). 


The orthogonal eigenvectors are 


1, (k=1,...,4), 
Th = 
se (k=a+t1,...,N) 


with (s;,) = Sy = @7_, S; where S; is the matrix 


| ees Aes ees 
= ee a ees 
=o: mao 

Sr = ee | 
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Proof: We have 


|V-aln | = [1 Viral | = [| to + In, (0? 4) | 
i=l i=l 
and due to Lemma 6.1 
| V-Aly | = II { (Cm say (njo7, + 0° -A) \ =(o aoe (njo;, + 0° -A). 
i=l isl 
This term has the (N - a)-fold multiple zero 4 = 67 anda zeros 1; = n;02 + 0°, 
and this proves the first part. 

Orthogonal eigenvectors must fulfil the conditions Vr_ = Agr, andr? ry =0 
(k#k’). We put R=(n, ..., 7x) = (Tn, Sn), where Ty is a (N x a)-matrix and 
Sx a N x (N - a)-matrix. 

With Ty = O7_,,, we get 


Vigetnr Od. 
i=1 


Further the columns of Ty are orthogonal, and by this the columns of Ty 
are the eigenvectors of the first k eigenvalues. For the N-a eigenvalues 
Ap=o'(k=a +l, ..., N), we have 

Vr= ork (k=a+1,..,N) 
or 

(V-o7 ly) =0 (k=a+1,...,.N) 
or 

TNTy k= 0 (kK=a41,...,N). 
With Sy = 67.18; = (Ta+1-41n) the last condition is fulfilled if 1,,-1,,,S;=0. 
From the orthogonality property, it follows that 


20.--. 0 
06--- 0 
STSi=) 
00 nj(nj-1) 
With 
1 1 1:1 
-1 1 1:1 
-2 1.1 
= Sarr | 
0 -(n;-1) 


all conditions are fulfilled. Further the columns of Tx and Sy are orthogonal. 
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6.2.1 Estimation of Variance Components 


For the one-way classification, several methods of estimation are described and 
compared with each other. The ANOVA method is the simplest one and stems 
from the originator of the ANOVA R. A. FISHER. In HENDERONS fundamen- 
tal paper from 1953, it was mentioned as method I. 


6.2.1.1 Analysis of Variance Method 

In Table 5.2 we find the SS, dfand MS of a one-way analysis of variance. These 
terms are independent of the model; they are the same for model I and model IL. 
But E(MS) for model II differs from those of model I. Further we have to respect 
that for model II; y, in (6.8) within the classes are not independent. We have, 
namely, 


cov (v9) =E | (2-H) (Yin -1)| =E[(a; + ey) (ai + ex)| 
and from the side conditions of model II it follows: 

cov (v9) = E(a;) = O7. 
We call cov(y¥, yx) the covariance within classes. 


Definition 6.3 The correlation coefficient between two random variables yj 

and y,, in the same class i of an experiment for which model II of the ANOVA as 

in (6.8) can be used is called within-class correlation coefficient and is given by 

aa o+07 

The within-class correlation coefficient p; is independent of the special class i. 
We now derive E(MS) for model II. E(MS;) = E(MS,,,) is as in model I equal to 

o*. For E(MS,) follows from model (6.8) 


>| fe) 


At first, we calculate E(Y;,). The model assumptions supply 


2 
(ns +nai+ 5 «) | =n? + nro? + no 
j 


E(MS,) = — (: 


EY, SE 


and by this 


¥? 
e(y-2) =Ny? +No?+ao’. 


i Ul 
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y2 
For (7) we obtain due to Y..=Ny + >07_ midi t+ D0, 6; 


E Lf =Ny2+ = Soro? +0? 
N N La r 


And therefore we obtain 


E(MSa) = — oi (w xn) a (6.11) 


If in (6.8) all 1; =n, then because )°n? =n?a andN=a-n 
E(MS,) = 0? + no%. 

Then an unbiased estimator of 0? simply can be gained from MS, and MS,,., by 
= (MS, — MS yes) 


or 


ia ie Ye 1 be 
eilt(Ee = wa(S%-DH}, 


L 
In general s? is given by 
a-1 
2. “"__(MS4-MS 2s). (6.12) 
yee 
N 
This corresponding estimates are negative if MS,..>MS,. 

As already mentioned this approach to put the calculated MS equal to the 
E(MS) is called the ANOVA method and can be used for any higher or nested 
classification. The corresponding estimators gained by transition to random 
variables; these unbiased estimators can give negative estimates. Later we will 
not use unbiased estimators, which sometimes give non-negative estimates of 
the variance components, see, for example, REML. 

If we are interested in an estimation in the sense of Definition 2.1 (mapping 
into R*) and use Max(0,s7) as estimator, the unbiasedness is lost, but the 
MQD becomes smaller as for s?. The matrix A of the quadratic form 

jn ee ee 
Y°AY=S~_j,-+ -— is 
re 1 Nj N 
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From (6.9) we obtain 


i ‘ 1 1 
tr[A var(Y)] = fe: -0% + ra 57 


2 “Ani 2 
=0, (»-3%) +o0°(a-1). 
Further E[Y"]AE[Y] = 0 because E[Y] = 1). By this we obtain again (6.11) from 
(6.5). The matrices A and var(Y) are confusing for higher classifications. For the 
case of equal subclass numbers, simple rules for the calculation of the E(MS) 
exist, which will be described in Chapter 7 for the general case of the mixed 
model as well as for specialisations for model I. The two methods presented 
below only for the case of unequal subclass numbers are really needed. 


6.2.1.2 Estimators in Case of Normally Distributed Y 

We assume now that the vector Y of y, in (6.8) are N(1y, V)-distributed with V 
from (6.9). Further we assume n; = n(i=1, ..., a), that is, N = an. From (6.10) 
and Lemma 6.1, it follows that 


|V | = (0?) (0? + n02)* 


and 


1 o 
-1 
V =O; 2 Th 2 4 5 lan 
oO oOo + no? 


with a summands in the direct sum. The density function of Y is 


f(v 11,0" ,0,,) = es [3(¥ -#1w)? VEY -p1y)] 
(22)! |VE 
gol! Hw)! (Hl) + rg ten)” 8 tno 1) 
: (2n)5(0?)! (0? + no?) 
Because 


(¥—why)" (Ym) = 30 (99-9, +7)-#)- 


and 


2: 
(Y—Hlw)” @lnn(Y why) =) (99-5, +39;-#) » 


ij 
7 Wyo ~~)? +an"(¥..-n) 
this density becomes 
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1(SSres  SSa_— an(y.-n) 
; os) s( o Pane o” + no 
: = 


(2)? (02)X"") (62 + no2)! ; 


f(Y|Ho,07, 


with SS,., and SS, from Theorem 5.3. 
The maximum likelihood estimates 6’, 62. and ji are obtained, by zeroing the 
derivations of In L with respect to the three unknown parameters and obtain 


0= y- 
oe + no Om HL) 
eo a(n-1) a SSyes SS4 
26° _-2(6* +62) 26" 2(a + ne2)” 
SS. 
0= na nSS, 


2(6° +62) "Oe +62)” 


From the first equation (after transition to random variables), it follows for the 
estimators 


H=y.. 
and from the two other equations 


a(6 +62) = SS, 


or 
SS yes 
oe = =s° = MS yes (6.13) 
a(n-1) 
and 
1 1 1 
== = “MS = (2 ) Ms. MS; : (6.14) 
n| a nN a 


Because the matrix of the second derivations is negative definite, we reach maxima. 
As it is easy to see, ff and s? are for wand” unbiased. But 6? has following 
(6.11) the expectation 


1 


E(@) = |(1-3) (o + not) -<*| =03-— (0? +no?). 


nN 


a 


1 
Because 6? for (1 = us ‘4 < MS, 25 is negative, (67,6) is in general no MLS 
a 


concerning (o°,07) because following Chapter 2 the maximum must be taken 
with respect to Q, that is, for all @ € R' x (R*)’. 
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Herbach (1959) could show that besides this maximum j= y.. leads to 


1 1 1 
oF —|{ 1-—]MS,-MS,.5|, if | 1-- |MS, => MSs 
O,=<(n a a (6.15) 


0 otherwise 


and 


1 
; 2, if | 1-— ] MS, > MS x0 
Cc = ; : ( ;) . . (6.16) 


0 otherwise 


Both estimators are biased. 
Using the given notation of SS,,, after Theorem 5.3 and SS,, the exponents in 
the exponential function of f (Y|,07,07) are equal to 


LY; 1 | -$ an(y.. =) 


2(0? +n?) 


EO | aren 


Wey 


53 


=mMi(Y) +12Mo(Y) + 3M3(Y¥) + A(7) 


where A(7) only depends on 0. 
This is the canonical form of a three parametric exponential family of full 
rank with 


_ 1 7 n _ n 
BES gga ee 2(0? +n02)’ a 2(o? + no?) 
and 
M,(Y) = Yip M2(Y) = S_ 97, M3(Y) = 
i=1j=1 i=1 


By this is (M,(Y), M2(Y), M3(Y)) following the conclusion of the Chapters 1 and 
2 an UVUE of (11, 12,3). 


6.2.1.3 REML Estimation 
The method REML can be found in Searle et al. (1992). We describe this esti- 
mation generally in Chapter 7 for mixed models. The method means that the 
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likelihood function of TY is maximised, where T is a (N - a — 1) x N-matrix, 
whose rows are N — a — 1 linear independent rows of Tee NOOR 
The (natural) logarithm of the likelihood function of TY is 


1 1 ore o 7 
InL - =(N -a-1) In(2z)- =(N-a-1) Ino In TVT 
2 2 2 \le? 

2 
=o VEr 
o o 
20?7YT TT-£TVI'TY 
oO 


2 
oO 

Now we differentiate this function with respect to o” and—4 and zeroing this 
oO 


derivation. The arising equation we solve iteratively and gain the estimates. 
Because the matrix of second derivatives is negative definite, we find 
maxima. 
This method is increasingly in use in the applications; even for not normally 
distributed variables, the REML method is equivalent to an iterative MINQUE; 
it is discussed in the next section. 


6.2.1.4 Matrix Norm Minimising Quadratic Estimation 
We look now for quadratic estimators for o? and o” that are unbiased and 
invariant against translation of the vector Y and have minimal variance for 
the case that 6 = Ao” with known 4 > 0. By this the estimators are in the sense 
of Definition 2.3 LVES in the class the translation invariant quadratic 
estimators. 

We start with the general model (6.8) with the covariance matrix var(Y) = Vin 
(6.9) and put 


IS 


=A, AER*. 


a 


Qa 


Theorem 6.3 For model (6.8) under the corresponding side conditions 


qe 1-24 + 1K] Q,~(L-AK)Qy}, G17) 
8 = ape (L-aK)Q)] (6.18) 


po) 
is at 1 € R* an LVUE concerning ( 4 in class K of all estimators of the quad- 
(oy 


ratic form Q = Y" AY, having finite second moments, and are invariant against 
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transformations of the form X = Y + a witha constant (n x 1)-vector a. Here the 
symbols L, K, Q;, Qz in (6.17) and (6.18) are defined as follows: 
Initially let 


and 


ko k3 ks 
Lek=— , KaksO2 #2: 
1 2 K + k2 
pea Sy, (6.19) 
Qa i) : 
and 
Q =Q + SS; (6.20) 


with SS, from Section 5.2. 
The proof of this theorem is from Rao (1971b) and is not repeated here. 


6.2.1.5 Comparison of Several Estimators 

Which of the estimators offered should be applied in practice? Methods leading 
to negative estimates for positive defined quantities are not estimators because 
they do not map into the parameter space and are often not accepted. In practice 
the estimation of 6? is often done following Herbach’s approach with a trun- 
cated estimation analogous to (6.15), but contrary to (6.16), s’ = MS, is always 
used. We lose by this the unbiasedness of the estimator of 07. 

For the special case equal subclass numbers n; = n(i = 1, ..., a), we have 


Theorem 6.4 The estimators of the ANOVA method 
s’=MS, (6.21) 
and s* following (6.12) and the LVUE (6.17) and (6.18) for (0,02) are for n;=n 


2 

identical. In this case the LVUE do not depend on 4 = ae and because of this are 
o 

also UVUE in class K. 


Proof: Initially from n; =n, (6.12) becomes 


1 
S= MS ~MS)). (6.22) 
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The constants in (6.17) and (6.18) simplify for 1; = 17 as follows: 


n an? an? 


= a k: 
aed? (nd +1)?’ ' (nd+1)* 


k 1 


By this we obtain 


_(a-1\)n_ , (4-1)? 
~ natl’ (nd +1)? 
and further 
-1 N -1)n? 
pg OE gai eae 
(nA +1) (nA +1) 
Finally 
a1 
N-1-24L+ VK =N-a+——.. 
(nA +1) 


Because in our special case y. =y. (6.19) and (6.20) simplify to 


n 1 
——,; SS, = ———_, SS, + SS}. 
(nd +1) ae (nd +1)? 


= 


By this S? in (6.17) becomes 


2 
s2 (nA +1) ty PONE: 1 | n SS, 


(N -a)(a-1)n? (nd +1)? | (nd +1) 
ae!) | : 85°55] 
(nA +1)° | (maA+1) 


1 
= —(MS,-MS;) 
n 
and this is independent of and identical with s? in (6.12). Analogously follows 
from (6.18) the relation 
S* = MS, =s°. 
By this we propose to proceed in the case of equal subclass numbers (v; = 1) by 


estimating analogue to (6.15) o% by 


1 
s? =—(MS,-MS;), if MS, > MS, 
SS = a n 


0 otherwise 


and o” by MS; via (6.21). These estimators are biased but have small MSD. 
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But how to act in the case of unequal subclass numbers? How good are the 
MINQUE-estimators if we use a wrong /-value? Often we have no idea how 
to choose 4. What is the consequence of ‘unbalancedness’ (the inequality of 
n;) to the UVUE-property? Empirical results are given in Ahrens (1983). 
MINQUE can of course be used iteratively or adaptively by starting with some 
a priori values for the variance components and choose the new estimates as a 
priori information for the next step. Such an ‘iterative MINQUE’ converges 
often to the REML estimates given in Section 6.2.1.3. For this, see Searle 
et al. (1992). Rasch and Maégata (2006) compared the four methods above 
and some more by simulation with unbalanced data. They found nearly no dif- 
ferences; the total variance was best estimated by REML and MINQUE. 


6.2.2 Tests of Hypotheses and Confidence Intervals 


To construct confidence intervals for o2 and o” and to test hypotheses about 
these variance components, we need as in Section 6.2.1.2 a further side condi- 
tion in model equation (6.6) about the distribution of y,. We assume again that 
yy are N (u, 07, + o°)-distributed. Then for the distribution of MSz and MS,,s, use 
the following theorem for the special case of equal subclass numbers. 


Theorem 6.5 The random vector Y following the model equation (6.8) for 

ny, = +++ =n,=Nn for its components y,; may be N(men, V)-distributed. Here, 
SS 

V=var(Y) is given by (6.9). Then the quadratic forms — =u, and 
oO 

SS4 
o* + no? 
[a — 1]-distributed, respectively. 


= Uy are independent of each other and are CS[a (m - 1)]- and CS 


Proof: We write 


r : 1 la 
uy = Y AY with Aj = =) In -— ® lin 
oO Ni=1 


and 
1 la 1 
=Y'A.Y with A= pl lyn |. 
is el : 0 +02 8, “" N wy 
Now, from (6.9) with n; =n, 
1 a o a a 
ae = @ [07ln + O21 nn] -— © Inno ® inn} 
o |i=1 nN i= i=l 


(6.23) 
1a 
=Iy --~@ Lin 
Ni=1 
and this is an idempotent matrix using 


Lim ln = ™1pr> (6.24) 
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Further 


a 


Se Peg 
0° + 6,7 |i= 


2, 8 2 Hh a n 
Ag Lina = N,NOa + @ Lian -—1y,no = 96 Lin-=1N,.N 
1 N i=1 N i=l 


N 
(6.25) 


and this is idempotent. 
We now only have to show that A, VA, = 0, but this follows from 


2 2 la la 1 
(o + 1no7,)A, VAp = In - — @ Lin —-@ Lin-~1lnN =0. 
Ni=1 Ni=1 N 


Because rk (A,) = N- a= a(n — 1) and rk (A,) = a — 1, the proof of Theorem 6.5 
is completed. 
From Theorem 6.5, it follows that 


Corollary 6.1 Under the assumptions of Theorem 6.5 is 


SS, a(n-1)o” 


F= 6.26 
SS; (a-1) (0? + no?) oe) 
and under the null hypothesis Ho : 0? = 0, this becomes 
-1 
pa SSaana}) (6.27) 
SS; a-l 


and this is Fla — 1, a(n — 1)]-distributed. 

Corollary 6.1 allows us to use F in (6.27) to test the null hypothesis Ho : 0? = 0. 
The test statistic (6.27) is identical with that in (5.11) and under the correspond- 
ing null hypothesis both test statistics have the same distribution. If the null 


2 2 
ce fold of 
oO 


hypothesis is wrong then, F in (5.11) is in the case o? >0 the 


a centrally F-distributed random variable. By this we can construct confidence 
intervals for the variance components. Because “1, is CS[a(n — 1)]-distributed, 


SS, SS, 
ae (6.28) 
Platn-l1-S] [a(n—-1) 15] 
is a (1 - a)-confidence interval for o? if 1 =, = --- =. From Corollary 6.1 it 
follows that 
MS, -MS)F\-« MS, - MS\Fz (6.29) 
MS, + (n-1)MS\F\-«’ MS, + (n-1)MS\Fe : 
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2 
with F, = Fla+1,a(u-1)|e] is a (1-a)-confidence interval for a An 

o +02 
approximate confidence interval for o? in the case of unequal subclass numbers 
is obtained (Seely and Lee, 1994). 


6.2.3 Variances and Properties of the Estimators of the 
Variance Components 


As we have seen, estimators from the ANOVA method are unbiased concerning 
the two variance components. From (6.11) and (6.12) we get 
E(s2) =o, 


a a’ 


and 
E(s’) =o" 


Now we need the variance of the estimators s? and s*. By the analysis of 
variance method, all estimators of the variance components are linear 
combinations of the MS. From Theorem 6.5 it follows that MS, and MS,,, 
are stochastically independent if all subclass numbers are equal. In this case 
we have cov(MS,,;, MS,) = 0: 


var(s”) = var(MS es) 


var(s?) = “[var(MS,)+ var (MS res) (6.30) 


a 


In the case where Y is N(w1,, V)-distributed, it follows from Theorem 6.5 that 


SSres 71 
var (=) = 2a(n = 1) = var oe 5, . 


This immediately leads to 


var (s”) = var(MS,.s) = is : (6.31) 
a(n-1) 
Analogously 
SSA a-l 

var (= - +) =2(a-1) = var Let al| 

and 
2 2)2 
var(MS 4) = alert na) (6.32) 


a-l1 


Analysis of Variance: Estimation of Variance Components 


From (6.31), (6.32) and (6.30), we obtain, if Y is N(w1,, V)-distributed, 


var(s7) = eee + e } (6.33) 


n a-1 a(n-1) 


We summarise this in 


Theorem 6.6 Under the conditions of Theorem 6.5, the variances of 
Se * (MS -MS;) and s’ = MS, are given by (6.33) and (6.31), respectively. 
Further 

—2o4 


2 2) _ 
cov (s*,s7) ay 


(6.34) 
The relation for the covariance follows because 
1 1 
cov(s*,s”,) = cov| MS), 5 (MSA ~MS;)| = ~ = var(MSi) 


and from (6.31). 

Estimators for the variances and covariances in (6.31), (6.33) and (6.34) can be 
obtained, by replacing the quantities o” and o? occurring in these formulae by 
their estimators 6” = s” and 62 = s®. These estimators of the variances and covar- 
iance components are biased. It can easily be seen that 


var (s”)= a (6.35) 
a(n-1)+2’ 
2 [s? +5? s° 
es 4 6.36 
van(te) m|a+1 a(n-1)+2 638) 
and 
2s* 
2s?) = 6.37 
cov(s s.) nia(n-1) +2] ( ) 
are unbiased concerning  var(s”), var(s2) and cov(s?,s?) because, if 
FS MSx . Sah P 
z= ———__ is CS( f)-distributed then var(z) = 2f and by this 
E(MSx) (f) (z) = 2f and by 
2 
var(MSx) = 7 [E(MSx)|’. 
Further 
2 
E(MSj) -[E(MSx)]’ = EMSx)) 


(in more detail see the proof of Theorem 6.10). 
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In the case of unequal subclass numbers, Theorem 6.5 cannot be applied. But 

Formula (6.31) was derived independently of Theorem 6.5 and is therefore valid 
for unequal subclass numbers if we replace a(n - 1) by N- a. 
Deriving the formulae for var(s;) and cov(s’,s;) for unequal 1; is cumber- 
some. The derivation can be found in Hammersley (1949) and by another 
method in Hartley (1967). Townsend (1968, appendix IV) gives a derivation 
for the case y = 0. For the proof of the following theorems, we therefore refer 
to these references. 


Theorem 6.7 The random vector Y with the components in model equation 
(6.8) is assumed to be N (w1,,, V)-distributed; V = var(Y) is given by (6.9). Then 
for s? in (6.12), we receive 


2). 2 IN? So? + (Sov?) -2Nyy8] 


Py 


var(s oq 


(N2-Son?)* (6.38) 
4N 4. 2N?(N-1)(a-1) ot 


+ OO + 
N?- Yon; (N2-S7n?)"(N-a) 
Further 
2, 4 
var (s”) = Noy (6.39) 
—2(a-1)N 
cov (s*,s7) = ae) xo". (6.40) 
(N -a) (N2- DN; ) 
For n;= 1 we obtain the Formulae (6.31) to (6.33). If 4 =0, we get 
2 IN 
var (s,”) = MZ eo +20°6°N + oS A , (6.41) 
where 
ge ae ys ! ss (6.42) 
a N\4en, N-a"' ' 


is the ML-estimator of 0° if w= Ow. 


Example 6.1 Table 6.1 shows milk fat performances (in kg) y, of the daugh- 
ters of ten sires randomly selected from a corresponding population. The por- 
tion of the fathers in the variance of this trait in the population shall be 
estimated as well as the variances of this estimator and the estimator of the 
residual variance and the covariance between the two estimators. Table 6.2 is 
the ANOVA table, and Table 6.3 contains the estimates. 
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Table 6.1 Milk fat performances y, of daughters of 10 sires. 


Sire (bull) 
B, Ba B3 By Bs Be B, Bg Bo Bio 
120 152 130 149 110 157 119 150 144 159 
155 144 138 107 142 107 158 135 112 105 
131 147 123 143 124 146 140 150 123 103 
130 103 135 133 109 133 108 125 121 105 
140 131 138 139 154 104 138 104 132 144 
140 102 152 102 135 119 154. 150 144 129 
142 102 159 103 118 107 156 140 132 119 
146 150 128 110 116 138 145 103 129 100 
130 159 137 103 150 147 150 132 103 115 
152 132 144. 138 148 152 124 128 140 146 
115 102 154. 138 124 100 122 106 108 
146 160 115 142 154 152 119 
Ni; 12 12 11 10 12 12 11 12 12 12 
Vi. 1647 1584 1538 1227 1559 1576 1492 1593 1538 1452 


Vi. 137.25 132.00 139.82 122.70 129.92 131.33 135.64 132.75 128.17 121.00 


Table 6.2 Analysis of variance table (SPSS output) for the data of Table 6.1 of Example 6.1. 


Tests of between subjects effects 


Dependent variable: milk 


Source Type Ill sum of squares df Mean square F Sig. 
Sire Hypothesis 3609.106 9 401.012 1.272 261 
Error 33426.032 106 315.340 


Table 6.3 Results the variance component estimation using four methods. 


Method s? s? 


a var(s?) var (s2) cov (s?,s2) 
Analysis of variance 7.388 315.34 
MINQUE 8.171 315.35 
ML 3.248 316.03 199.45 1883.95 -161.99 


REML 6.802 315.90 271.26 1882.59 -162.06 
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According to the ANOVA method is s? = 7.388. In the ANOVA table, we find 
s* = 315.34. The inner-class correlation coefficient p; is estimated as 


7.388 
~ 322.728 
The ANOVA table is calculated by SPSS via 


ry = 0.023. 


Analyze 
General Linear Model 
Variance Components 


At first we receive the data and via OPTIONS the possible methods of estima- 
tion as shown in Figure 6.2. 

By using the button ‘model’ putting the sum of squares to 1, we get this 
result. 

In the SPSS output in Table 6.2, Sig leads to the rejection of the null 
hypothesis, in cases where the value is smaller or equal to the first kind risk 
a chosen. 

We now will estimate the variance components with SPSS using all available 
methods in this program given in Figure 6.1. 

Again we put SS to type I. In the window arising after this, we select the 
corresponding method (Figure 6.1). We obtain the results of Table 6.3. 

As we see, the results, except for the variance of factor A in ML, differ unes- 
sentially from each other. 


a 


Bie Bot ew Dats Transform Anaize Graphs_Uniftes Extensions Window Help 


SES ea KLAR nH Be Bo. 1% “~~ 


Visible: 3 of 3 Variables 


| @Boars | gbSows | Fattening| 

1 | 1 1 s 

B 1 ) 

eal 1 v7 

4 | 1 105 a 

| 1 2 107 

a be Components: Op 

all 109 

s | 107 z 

Cia ot Manmum likelincos 

10 3 106 © Restncted manmum ikstinood 
Tr 1 9 

| 2 1 102 

3B | 2 1 108 

| 2 1 sr 

5 2 2 8 =\— — 

= Z z bi WLS Weight: 5 

7 | 2 2 @ » 

| 2 3 1 

20 2 3 8 

ll 2 3 ot 


Figure 6.2 The data of Example 6.1 and the possible methods of estimation in SPSS. 


Analysis of Variance: Estimation of Variance Components 


6.3 Estimators of Variance Components in the 
Two-Way and Three-Way Classification 


In this chapter, we consider only the ANOVA method. In case of unequal sub- 
class numbers, there are methods already shown in Section 6.2, which can be 
calculated with SPSS. But as in Section 6.1.2 for the one-way ANOVA, we also 
cannot say here that one of these methods is uniformly better than the 
ANOVA method; however in practice the REML method is increasingly used. 
Readers interested in this method are referred to Searle et al. (1992) and 
Ahrens (1983). 
For the following we need 


Definition 6.4 Let Y bea random variable with a distribution independent of 
the parameter (vector) 0. 

@ is an unbiased estimator of 0, being a quadratic function of Y. If @ 
has minimal variance amongst all unbiased estimators quadratic in Y with 


finite second moments, then @ is called best quadratic unbiased estimator 
(BQUE) of 6. 


6.3.1 General Description for Equal and Unequal Subclass Numbers 


Definition 6.5 Fora special model II of the ANOVA in Definition 6.2 and for 
correspondingly structured other models, we speak about a balanced case; if 
for each factor the subclass numbers in the levels are equal and in nested 
classifications the number of nested factors is equal for each level of the 
superior factor. 

Balanced cases are, for instance, the cross-classification with equal subclass 
numbers and nested classification with equal number of levels of the inferior 
factor and equal subclass number. 

In the one-way classification in (6.3) is r= 1 and for Z4, = Z = @f_ 1 en, we have 


enZ=nel, Zeg=en. (6.43) 


The general approach of the ANOVA method in the balanced case as already 
said is to look for the ANOVA table (except the column for E(MS)) for the cor- 
responding model I in Chapter 5. Now the E(MS) for model II are calculated, 
and the MS are formally equated to the E(MS). The solutions of the then arising 
simultaneous equations are the estimates of the variance components. The esti- 
mators are given by transition to the corresponding random variables. The fac- 
tors of the variance components in the E(MS) can be found by using the rules of 
Chapter 7. We denote by q = (MS,, ..., MS,)" the vector of the MS in an 
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; T 
ANOVA table and with (o7,...,07) the vector of the variance components and 
with K the non-singular matrix of the factors kj so that 


(s2,..482) =K71g. (6.44) 


r 


2 
* 


of (2p.002) and we get (s2,..82)" =K~'q. 


? ? 
The random solutions of q = K (oj,...,07) are used as an estimator (sj,...,87) 


Without proof we give the following theorem: 


Theorem 6.8 (Graybill) 
In an ANOVA for a special linear model of the form (6.3), we have in any 
balanced case: 


1) The estimator (6.44) is in the case that y,, in (6.3) have finite third and 
fourth moments and are equal for all elements of y,, (and for each j) 
a BQUE. 

2) The estimator (6.44) for normally distributed random variables Y is the best 
(unbiased) estimator. 


The proof of this theorem can be found in Graybill (1954). 
The unbiasedness follows immediately from 


r 


E| (Shesni8f) | =K“E@)=KAK Ghee?) = (lene) 
The covariance matrix of the estimator (6.44) is 


var  (si.--52)"] = K~! var(q)K7!. 


Theorem 6.9 Let (6.3) be a special model of the ANOVA following Definition 
6.2 and Y in (6.3) is N-dimensional normally distributed. Then in the balanced 
case for SS; of the corresponding ANOVA (see Chapter 5) with the degrees of 
freedom v,(i = 1, .... r + 1, SS,,1 = SSyes) 


with the positive definite matrices A; of rank v; that are independent of each 
other CS(v;)-distributed. 

The proof of this theorem can be obtained with the help of Theorem 4.6 
showing that A;V is idempotent and A;VA; = 0 for i £ j and HITAjl pp = 0. 
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Theorem 6.10 If y in (6.3) in the balanced case is N-dimensional normally 
distributed, then for (s?,...,s2)’ in (6.44) 
2 2)T -1 ry=t 
var (s{,.8?)"] =K~'D(K) 
2 
with the diagonal matrix D, having elements equal to 5 [E(DQ;)|’. The v; are the 


degrees of freedom of MS; for i=1, ... ,r+1. Further 


dali ey | =KD(KT)"* 
( uF r 


is with the diagonal matrix D with elements MS? an unbiased estimator 


for var | (s..-5?) ‘| : 


Vjz+2 


iMS; 
Proof: From Theorem 6.9 follows that in the balanced case, Sees 
E(MS;) 
(i=1,...,7 +1) are independent of each other CS(v,)-distributed. Therefore from 
var(y~) = 2n for each CS(n)-distributed random variable y* 


vat) = vi var ;) =2v; 
ve Fas) 7 [E(MS;)|" ar(MS;) = 2v; 


and from this follows because of cov(MS;, MS;) = 0 for all i 4 j the stated form 
of D. 


Because 
2 
var(MS;) = E(MS?) - [E(MS;)|” = | E(MS:))°, 
we have 
2) _ 22+; 
E(MS?) = [E(MS),] a 


2 - 
- 5M; is unbiased concerning —{E(MS;)]*, and we get E(D) =D. 
Vi Vj 


and 


We consider now the unbalanced case, that is, such models for which (6.43) is 
not valid. We restrict as already said on the ANOVA method because it is simple 
to calculate and no uniformly better method exists — but see Ahrens (1983). 

The analogy is as follows. The SS; in the balanced case can be written as linear 
combinations of squares of the components of Y and of partial sums of this com- 
ponents. We denote now these elements in the SS; written as linear combina- 
tions by s4,, where the A; are the factors or factor combinations in (6.3) (Sa, = S, 
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is assigned to 2). Analogously to the s4, for the balanced case, the corresponding 
S,4, for the unbalanced case are calculated as follows: 


Ds 


Y 
S, = ae Stes = YTY =Szy,,, 


Gi yA, 
S4,= ¥"(4y) (i=1,...,7). 
<4 N.(Aj) 


(6.45) 


In (6.45) Y.(A;;) is the sum of components of Y in the j-th level of the factors (or 
factor combination) A;, N.(A,) is the number of summands in Y(Aj) and a; are 
the number of levels of A;. 

S,, are transformed to quasi-SS with the help of the linear combinations 
derived for the balanced case. Putting these quasi-SS or the corresponding 
quasi-MS equal to their expectations leads to simultaneous equations. The solu- 
tions are the estimates the variance components from the ANOVA method for 
the unbalanced case. The denotation quasi-SS was chosen, because these quad- 
ratic forms are not always positive definite and by this not a sum of squared 
deviations. For the estimation of the variance component, that is, however, 
irrelevant. 

For the derivation of the simultaneous equations, we need the expectations of 


the quasi-SS and by this the S4,. Denoting by k( 67,S,, ) the coefficients of o? in 
L g 'Yy ] 1 yi 


expectation of S,,(i,j=1,...,r), we can calculate these coefficients following 
Hartley (1967) (see also Hartley and Rao, 1967). We put 


-yS>—_y(a,)-¥"BY = 
S4,= DUN ,) Y’ (Ay) =Y "BY =S,,(Y) 


and use Z,, = [Z1(Aj),.-+,2a;(Ai)] with the column vectors z,(A;)(j=1, ...,d)). 
Then we have 


ky = k(o},5s,) = Sa, [z(Ai)] : (6.46) 
j=l 


For the derivation of (6.46), we refer to Hartley (1967). The coefficients of o” are 
equal to a; and we have further 


E(Sres) = E(YTY) =N(e re! (f,-..02)"), 


r+1 r 
If in the balanced case for the calculation of the SS the formulae 
SS; = S ciSa, + Cy 4 1,iSres 
= (6.47) 


¢ 
SSires = S Cr +18, + Cr+ 1r+1Sres 
j=l 
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are valid the quasi-SS (QSS,), also in the non-balanced case are calculated by 
(6.47). Let C be the positive definite matrix of the coefficients of oF in the expec- 
tations of QSS; (i row index, j column index), a* be the vector of the coefficients 
of o”, = the vector of the variance components 6? (i= 1,...,r) and S be the vector 
of QSS;, so we gain the simultaneous equations 


eee . 6.48 
“Lol N-p} \o? om) 


where p is the number the subclasses with at least one observation. The matrix 
of coefficients comes from (6.47), (6.46) and the corresponding formulae for the 
SS in the balanced case. From (6.48) we get the estimation equations by the 
ANOVA method in the form 


(s.)* lo vo) 
= ee (6.49) 
SSres 0, N-p s? 


where 3” = (s?,...,82). From (6.49) we obtain 


S 
SSres 


E 


1 
Dit 
s’ = Nps (6.50) 
and 
X=C7!(S-s’a"). (6.51) 


Formulae for the variances (and estimators of the variances) of s” and s? can be 
found in Searle (1971). 
6.3.2. Two-Way Cross-Classification 
In the two-way cross-classification, our model following Definition 6.2 is 
Vin =H + aj +b; + (a,b); + ex Gsla@iaLogkk= lity): (652) 


with side conditions that a, bj, (a, b), and ej, are uncorrelated and 
E(ai) = E(b)) =E((a,b),) = E (ajbj) =E(a;(a,b);) =E(bj(a,b),) =0 
E (eyx) = E (aie) =E (Bei) =E ((a,b) ein) =0 for all i,j,k 
var(a;) =o, for all i, var(bj) =o; for all j 
var ((a,b);) = 07, for all i,j, var(ej) =07 for all i,j,k. 


For testing and constructing confidence intervals, we additionally assume that 
yijx is normally distributed. 
A special case of Theorem 6.9 is 
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Theorem 6.11 In a balanced two-way cross-classification (1, = n for all i, j), 
model II and normally distributed y,, the sum of squares in Table 5.13 are sto- 
chastically independent, and we have 


SS4 
2 2 2 
bno®. + noi, +o 


isCS(a-1) 


SSz 
2 2 2 
aNno;, + No, +o 


isCS(b-1) 


SQiz 


2 2 
NO7p + Oo 


is CS[(a-1)(b-1)|-distributed. 


Theorem 6.11 allows us to test the hypotheses 


2 2 2 
Hy: 07, = 9, Hp : 0, = 0, Hago : 7) = 9. 


Theorem 6.12 With the assumptions of Theorem 6.11, the test statistic 


_ S84 


F,= 
7 SS aa 


(b-1) 


2 Dp 
bnoz,+no7,+o 


is the fold of a random variable distributed as 


no>, + 0? 
Fla - 1, (a - 1)(b - 1)]. If Hypo is true, Fy is Fla - 1, (a - 1)(b - 1)]-distributed. 
The statistic 


2 2 
anoy, +07, +0 


2 
is the -fold of a random variable distributed as 


no, + 0? 
F[b - 1, (a - 1)(b - 1)]. If Hgo is true, Fz is F[b — 1, (a — 1)(b — 1)]-distributed. 
The statistic 


SSag  ab(n-1) 
SSrea (4—1)(b-1) 


Fap= 


no~, + 07 
is the fold of a random variable distributed as F[(a — 1)(b - 1), ab(n - 1)]. 
If Hygo is true, Fup is Fl(a — 1)(b — 1), ab(n - 1)|-distributed. 


The proof follows from Theorem 6.11. The hypotheses H40, Hgo and H,go are 
tested by the statistics Fy, Fz and Fz, respectively. If the observed F-values are 
larger than the (1 - a)-quantiles of the central F-distribution with the corre- 
sponding degrees of freedom, we may conjecture that the corresponding vari- 
ance component is positive and not zero. 
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To derive Theorem 6.11 from Theorem 6.9, we need (for the balanced case) 
E(MS,4) = bno?. + no?, + 0” 
E(MSz) = ano; + no?, + 0° 
E(MS az) = o>, + 07 
E(MSrest) = o 


(see Exercise 6.4). 
Table 6.4 is the ANOVA table of the balanced case. With (6.53) the ANOVA 
method provides the variance components of the balanced case 


(6.53) 


1 
s =, MS rs; Sib = (MS 42 -MS res) 
(6.54) 
Ss; = ay Pt) Ce s2 = + ms, —MS 43) 
ban “4 bn 
Formula (6.54) is a special case of (6.44), because (6.53) generates K in (6.44) as 
bnO nil 
K= 0 anni 
0 0 al 
0 0 O1 
We get |K| = abn’ and 
a0OQ-a 0 
1|0b6-b 0 
K l= 


~ abn 0 0 ab-ab 
00 O abn 


From Theorem 6.10 the variances of the estimators s?,s7,s2), and s” are obtained as 
follows. At first we calculate the diagonal matrix D from (6.53) or from 
Table 6.4: 


2 2 2 2 
ee eee) Bie OD 
diy = = (bro, + no gy +6 \", do = pay 0% + aw +6 ie 


2: 2 2 
d. =——__—___ (yo +o , a. = —_____o* 
is G@n1)(b-1)| av to)» daa ab(n-1) 
Table 6.4 Supplement of Table 5.13 for model Il. 
Source of variation E(MS) F 
Between levels of A o° +no?, + buo. (b-1) SS4 
SSap 
Between levels of B oO + no? yt ano Gan SSz 
SS az 
Interactions o +no*, ab(n-1) SSap 


(4-1) (b-1) SSres 


Residual o 
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From this we obtain the covariance matrix V of the vector (s7,5;,87,.5") 


di, +433 d33 — d33 0 
b?n2 ~— abn? bri? 
d33 do +d33 — d33 0 
abr? arn an 
— d33 — d33 d33+da4 —daa 
bn? an? ne n 
0 0 Mb) | a 
n 
For instance, var(s”) = ae and 
: ab(n-1) 


2 
a(a-1)b(b-1)n? 


Estimators of the elements of the covariance matrix V are the elements of the 


cov (s7,8;) = (n° 0%, + 2non,0° +0). 


matrix V, which is gained from V a replacing the d,; by dj;, where 


* 2 & 
=—_MS?, —_ M. 
dy, a Si, dy= = I Si 
‘ 2 A 2 : 
d33 = ———_—_—_ MS“. ,, dag = —————MS es 
2 Gaba 42. OP OP gaat 4 3s 
For instance, we have 
2 
2 2 
=——___M. 
vars) ab(n-1)+2 Stes 
and 
2 


22 2 
cov (S285) = ((a=1)(b=1) + abn a8: 
The unbalanced case: In p classes at least one observation may be present 
(0 < p< ab). If p = ab, we assume that not all nj are equal. The quasi-SS (with 
the aid of 6.37) are analogue to the SS in Table 5.13 


QSS,=S1-S, 
eee eee (6.55) 
QSS 4p = Sap-S4-Szp +S, 
QSS 05 = Sres — Sag 
with 
Ni 
S.-Y?) Sr >PDEL 
ao (6.56) 


Sa= a Sz= ae Sap = aS se 


i=1lj= ij 


Analysis of Variance: Estimation of Variance Components | 323 


Here X* means that only summands with a non-zero denominator have been 
taken. Equation (6.56) is a special case of (6.45). The expectations of S,,, $4, Sz, 
Sz and E(QSS,.;) = (N - p)o” can be obtained by utilising the model equation 
(6.52) in the Formula (6.56) or from Formula (6.46). In the present case we get 


b 22 b 4 2 
ss Pj _ Wy (ab);, 2 
E(Sa) = Niwo+eE Nia? + 77 yd S 2 ] i we 


i=1 


b 12 

a a 

2 2 Li My 9 dja 2 2 

=Np* +No7+ s nN, Cet s O7np + 40 
i=1 i 


b 2 a a 2 
a N2 . N- a Yn 
E(S,) = Nye + 251 fg? 4 let J gp y Zui! fev 4? 


E(Sres) = N (v2 + 02 + 0% + 0%, +07) 


and by this 


b 
a é 2 a ; rn. b 2 
ae 8) 2 jal 9 dai=1Nj 
| + 0; 
L 


E(QSS 4) = 0° | Ni - 


b 49 


H(QS5,) | Daa" met os | 2st 


1d 


iT) 


b 
ie 
E(QSS 4p) = 0% Date ==) - 
J= 


b b 
2 2 
Desig plat” 


N ae mas SH 


a 
2 
NnNij 
i=l 7 2 
+0 
N; ’ 


+o0°(p-a-b+1) 
E(QSS,.5) = (N -p)o 
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If all classes are occupied, we have p = ab. 
We obtained estimators s2,s?,s?, and s* with the ANOVA method by repla- 
cing the E(QSS) by the QSS and the variance components by their estimators. 


6.3.3 Two-Way Nested Classification 


The two-way nested classification is a special case of the incomplete two-way 
cross-classification; it is maximal disconnected. The formulae for the estimators 
of the variance components become very simple. We use the notation of 
Section 5.3.2, but now a; and 5; in (5.33) are random variables. The model equa- 
tion (5.33) then becomes 


Vie =H + a; + by + eye, (i= 1,...,4;) = 1,..., Bek = 1,...,My) (6.57) 
with the side conditions of uncorrelated a; , bj and ej, and 
0=E(a;) = E (by) = COV (a;,b;) = COV (ai, e;jx) = COV (Bi; ex) 


for all i, j, k. 

The quasi-SS of the sections so far become real SS, because Theorem 5.10 is 
valid and independent of the special model. In Table 6.5 we find the E(MS). In 
this table occur positive coefficients 1; defined by 


(6.58) 


Table 6.5 Column the E(MS) of the two-way nested 
classification for model Il (the other part of the analysis of 
variance table is given in Table 5.19). 


Source of variation E(MS) 
Between A-levels o + Ano; +3 o 
Between B-levels within A-levels o + Ao; 


Within B-levels o 
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We gain the coefficients in (6.58) either by deriving the E(MS) with the help of 
the model equation (6.57) or as special cases of the coefficients in the E(QSS) of 
the last sections. From the analysis of variance method, we obtain the estimators 
of the variance components by 


8? = MS yes 
 (MSp in A~MS res) 
8p => in A res 
A ages Seater (6.59) 
An Ap 
MS _2 MS in 1-— MS ves 
Sa z( tf Ay te ( A :) ) 


With 
A, = (B. -a)A,, Ay = (a- 1)Ay, A3 = (a- 1)Asz, 


da = N+ NE tai | ae ZN] 


Ag = (4,-a) +N) [Nap (2 oy ee N) a] «(al 426) 


a b a 
—2(At +23) Bye go! a LY + My 
1 2 Lo N ij aa a , 


i=1lj= i= 
neo [araw 1)(a-1) a eee 
om (iene a b 
sand |S ome BN Som 
b=, j=1 i=l j=l 


and 
Ay = AP Ay Aro = AA (0, +45), 


the following formulae for the variances of the variance components result 
under the assumption that y,, are normally distributed 


var(s”) = c = o 
Tei 
var (s2) = — (Aso4 + Ago} + Azo* + 2Ago2.0% + 2go207 + 2A1007,07), 
143 
b n’ 
2 & wa yt rd) ae 
var (s7) = R242 S Yr = Sasa 
eT i=1 i=1lj= 


2(B.-a)(N-a)o* 


41,05,0° 
+4A,0,0° + N-B 


(6.60) 
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and the covariances 
, 2 
cov(s2,s”) = eee -(a- 0| et) 
(B.-a)var(s’) 
a 


cov (s,s?) = 


COV (s2, s}) 


M2 N2 | NN ylvti|% S. (6.61) 


i=1 | j=l ‘t 


as 


, 20@-VY(B--4) a 


N_B. J Ayvar (sj) 


6.3.4 Three-Way Cross-Classification with Equal Subclass Numbers 


We start with the model equation 


Vignd =H + Gi + Bj + CK + (a,b) ;; + (b,c) x + (a,b, ©) ix + €jk1 


(6.62) 

(i=1,..,aGj=1,..,b;k =1,..,¢1=1,....1) 
with the side conditions that the expectations of all random variable of the right 
hand side of (6.62) are equal to zero and all covariances between different ran- 
dom variables of the right side of (6.62) vanish. Further we assume for tests that 
Yet ate Normally distributed. Table 6.6 is the ANOVA table for this case. 


Table 6.6 The column E(MS) as supplement for model II to the analysis of 
variance Table 5.21. 


Source of variation E(MS) 


Between A-levels o+ NO*»y, + cno?,, + bno?. + beno® 
Between B-levels o+ NO*y, + cno?, + ano; + acno;, 
Between C-levels o+ NO*y, + anox, + bno®, + abno? 
Interaction A x B oO + NO*y, + cno?, 

Interaction A x C o+ NO*y, + bno?. 

Interaction B x C O° + NO, + ANG, 

Interaction A x Bx C o+ NO*y, 


Within the subclasses (residual) o 
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Following the ANOVA method, we obtain the estimators for the variance 
components 


MS, =s" + ns*,,. + cns?, + bns?., + bens? 


= oye 2 2 2 
MS; = S* + 187, + CNS?) + GNS;,, + ACNS;, 


2 


MSc =s* +ns?,,. + ans}, + buns, + abns? 


abc 
dic yor 2 
MS 4p = 8° + 187, + CNS?) 


2 


2 
‘abe + bns;,, 


MS 4c =8* +ns 
eo He 2 
MS3c = 8° + 18%, + GNS;,. 
a elaerD 
MS apc = 8° + 18%). 
2 
MS rest = S*. 


Under the assumption of a normal distribution of y,;, it follows from Theorem 
6.9 that 


Theorem 6.13 If for the yj; model equation (6.62) including its side condi- 
tions about expectations and covariances of the components of yj is valid and 
yin are multivariate normally distributed with the marginal distributions 


2,,2,,2, 2 ,.2 ,.2 , 2 2 
N (1,03 + 04 + 0 + O74 + One + Op, + Cape + O°); 


SSx 


E(MSx) 
E(MS\x) and dfx from Table 5.21. 


then are CS(dfx)-distributed (X=A, B, C, AB, AC, BC, ABC) with SSy, 


From Theorem 6.13 it follows that the F-values of the first column of Table 6.7 
have the distribution given in the third column. By this we can test the hypoth- 
eses Hyp: 07, =0,Hac :07,=0, Hac : 0%, =0,Hazc :07,, =0 with an F-test. 

For testing the hypothesis Hy, : 0? = 0,Hg: 07 = 0,Hc : 02 =0, we need 


Lemma 6.3 (Satterthwaite, 1946) 


CS(n)E(zi) 4... 
If z,,...,Z,% are independent of each other as CDE) -distributed, 
Nj 
(i = 1,...,k), so for real a; 
k 
Z= So aizi 
i=l 
is with 
i 2 
(hai) 
n’ =~—___, + (6.63) 
woe 
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Table 6.7 Test statistics for testing hypotheses and distributions of these test statistics. 


Distribution 
of the test statistic 
Test statistic Ho Distribution of the test statistic under Ho 
MSaz_ o2, =0 cno?, +o, +0 F[(a - 1)(b - 1), 
F -_ ab ab abc . = 2 ms = = 
aE IS ae net, +o? F-E-1) 2-DNO-YE-M) @-1e-Ne- 0) 
MSac 62. =0— bno?_ + no, +07 F[(a - 1)(c- 1), 
Fac= ac oa abe. = - = = = 
AS MS 56 not, or M4-A)le-1).(a-1(b-1Yle- 1) (q— 1-1-0) 
MSzc ot =0) ano? +no?, +0° F[(b - 1)(c- 1), 
Fac= be abe abc 5, = = = = 
BC = FAS ac hear? F|(b-1)(c-1), (a-1)(b-1)(c-1)| (a-1)(b-1)(c- 1] 
MSasc o?,,=0 no?,.+ 6" F[(a - 1)(b- 1) 
Fasc = MS... b —S — F[(a-1)(b-I), c-1),N-abc (1), N= abel 


CS(n'\E 
approximately SWOPE) distributed, if E(z) > 0. 


This means that each realisation z of z is a realisation of an approximately 
CS(n')-distributed random variable. The approximation is relatively good for 
positive a; (see also the remarks after Theorem 6.2). 


We further need the following corollary to this lemma: 


MS;n; 
Corollary 6.2 If MS; are independent of each other and if z; = S), are 
CS(n,)-distributed (i = 1, ..., k), so is 
peti MS 
i MS; 


under the null hypothesis Ho : 0? = 0 approximately F(n, m)-distributed with 


»_ (SherMS)” (Sof wi)” 
> MS? ’ » MS? 


ie Lng 


Nj nj 


if 


E 


ys =cor +E ss 


and the second summand of the right side is positive. 
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Gaylor and Hopper (1969) could show by simulation experiments that the dif- 
ference MSp between MS, and MS, which are independent of — other with 


degrees of freedom n; and ny, respectively, multiplied with ———— E(MSp) a Sp)’ that is, 


MSp = (MS; - MS 11) 75 are exact (or approximately) CS(n,)-distributed 


FMS; ) 


MS 
if Mp are the degrees of freedom of MSp and is exact (or approximately) 


E(MSi) 
Syn Il 
E(MSi) 


CS(ny)-distributed aa is exact (or approximately) CS(,;)-distributed 


with 
(MS; -MSy)* 
ny ny 


NpD= 


The approximation is sufficient as long as 

MS, 

MS; 

We use this corollary, to construct test statistics for the null hypotheses 

Hao : 0% = 0,Hp0 : 67 = 0 and Heo : 0? = 0, which are approximately F-distributed. 
From Table 6.6 we find 


> F(nn, n1,0, 975)F (m,n,0.50). 


E(MS,) = bcno? +E(MS 4p +MSac - MS 42c) 
E(MSz) = acno; +E(MSaz +MSzgc -MS zc) 
E(MS¢) = abno? +E(MSac + MSzc —MS 2c) 


so that 
MS, 
MS 4B + MS 4c —MS asc 
is under H,4y approximately F(a,, a2)-distributed, 
MS3 
MS 43 + MSpc -MS apc 
is under Hgo approximately F(b,, b2)-distributed, and 
MSc 
MS8x3c + MS ac —MS gsc 
is under Hc approximately F(c,, c2)-distributed. From (6.63) we get 
a,=a-1, bj =b-1, cy =c-1 
(MS ap +MSac —MSasc) 
MS, | MSc MSc 
(a-1)(b-1) (a-1)(c-1) “a 1)(b-1)(c-1) 


Fa= 


Fp= 


Fo= 


ag= 
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Analogue formulae are valid for bz and c. 
As Davenport and Webster (1973) show, it is sometimes better to use in place 
of F,, Fg and Fe, respectively, the test statistics 


Fi = MS, + MS 4Bc r _MSz + MS 4zc 
AST, MS 4p + MSc’ B MS 42 + MSgc 


and 
: _ MSc + MS apc 
© MS4c +MSp’ 


respectively. Here again the Satterthwaite approximation is used; for instance, 
F% is approximately F (aj,a3)-distributed with 


a= (MS4 +MSazc)” 
"MSA MSinc 
(a1) * (@-1)(-1)(e-1) 
and 
as = MS ap +MSic : 
MS‘ap MSc 


(a-1)(b-1) “(a 1)(c-1) 
For the case of unequal subclass numbers, we use model equation (6.62) but 
now / runs / = 1, ..., mj. Analogously to the two-way cross-classification, we 
construct quasi-SS (corresponds with the SS of Table 6.6) as, for instance, 


ass, = Sti Ly 
A= a re 
TING alN. 


HY? AY? RY? 


QSS 45 = Pa z= Pom = DON, * oY? 


i=lj= 


where X* means summing up only over subclasses with Nj. > 0. So we obtain the 
ANOVA table (Table 6.8). 
In Table 6.8 is 


b 2 
a Soca a © N2 
dab = ] = a dae = eet ik 
Sane arm 


b a 2 b c 2 
a wie Nip. 2 = NG 
bia = ee ET, | bc = 
j=l Nj. jel Nj 
b c ie) b a c 2 
j= 1k = ijk ype ea 
Na,be = y , Ab,ac = 


A he al Nj. 


Table 6.8 Analysis of variance table of a three-way cross-classification for model Il. 


Source of variation 


Between A-levels 


Between B-levels 


Between C-levels 


Interaction A x B 


Interaction A x C 


Interaction B x C 


Interaction A x Bx C 


Residual 


Quasi-SS Quasi-df 


QSS, a1 
QSS, b= 1 
QS c-l 


QSSaz  Pap-a-b+1° 


QWSrc Pac a-c+1* 


QSSzc Porc - b- + 1° 


QSSazc P~ Pab~ Pac ~ Phe + 
a+b+c-1* 


QSS,.; N-p 


Quasi-MS 


QMS 4 


QMS; 


QMS- 


QSMaz 


QMS ac 


QMSpc 


QMS,gc 


QMS yes 


Coefficients the variance components in E(QMS) 


N-k, 
a-1 
Av,a-Ka 
b-1 
Nea Ka 
c-1 
Ka-Ava 
Pab-4-b+1 
Ka- ea 
Pac-4-C+1 


Avea -Ab,a Aca + Ka 


Pbe-b-c+1 


CA 


0 


% 
4a,b — Ko 
a-1 
N-kg 
a-1 
Jeo —ke 
c-1 
Kp —Aa,b 
Pab-a-b+1 


Aacb ~4ab + ky -Aeva 
Pac-@-c+1 


Ke- Abe 
Poo-b-c+1 
CB 
10) 


(Continued) 


Table 6.8 (Continued) 


oe oa» Sic Sie Save 
Aase~Ke Aap Kav Ja,be ~ Koc 2a, be ~ Kabe 
a-1 a-1 a-1 a-1 
Ave ke Ava-K Avje—kbe Avyac — Kabe 
b-1 b-1 b-1 b-1 
N-k, Acab ~ Kab Ave~kbe Acab ~ Kabe 
c-1 c-1 c-l1 c-1 
2abye~ hase ~ Ab, + Kabe N~ha,b —Ab,a + Kab Aab,c — hae ~Ab,ac + Kae Aab,c—Aa,be ~Ab,e + kbe Aab,c~Aa,be ~ Ab,ac + Kabe 
Pab-a-b+1 Pab-4-b+1 Pap-4-b+1 Pap-a-b+1 Pap-4-b4+1 
Ke - dae 4acb ~4a,b ~Ac,ab + Kab N~AaeAea + kac Aac,b ~ habe ~4e,b + koe 2acb ~ Aa,be ~ Acab + Kabe 
Pac-4-c+1 Pac-@-C+1 Pac-@-cC+1 Pac-a-c+1 Pac-4-cC+1 
Ke-Abe Avea ~4b,a ~Ac,ab + Kab Avesa = Abyac ~ Aca + Kae N—-Age~Aoa t Koc Area ~Ab,ac ~ Acab + Aabe 
Pbe-b-c+1 Poc-b-c+1 Poe-b-c+1 Poe-b-c+1 Poc-b-c+1 
Cc CAB CAC Cac CABC 
0 0 10) 0 0 


“p = number of subclasses with at least one observation, p,, = number of Ny. > 0, pyc = number of Nj. >0, Pac = number of N;.;> 0. 
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roma a 
b 2 
2 Wx yee 
Aab,c = , Aac,b = ea St 
i=lj= Ny. fika Nix 
b a 


a 


5 oe . ijl a ij 
dete dy aed ad tn 


i=1j=1 ij i=1k=1 j=lk=1 


a b c 
kasd Nie BaD NG hea ON 
1 boc 
Kap “iy 2 Kac = =v ik? kye= > Nik 


i=1 j=l ielk= jalk=l 


V6A = Aba + Aca Abe,a — Kas 
veg =Aa,b + Ac,b —Aac,b — Ko, 
vec = hae + Abc + Aab,c —Kes 
VCAB = Aa,b + Aba + Ac,ab — Aac,b — Abca — Kab; 
VCAC = Jae + Ab,ac + Aea —Aab,c —Abe,a —Kacs 


VCBC = Ra,be + Ab,c aH Aeb —Aab,c —2ac,b =, Koes 


VCABC = N+ Aa,be + Ab, ac + Ac,ab Aab,c Aac,b Abe,a Kabes 


where 


V=P-Pab—Pac—-Poc + a+b+c-1. 


From the equations, gained from the coefficients of the E(QMS) when replacing 
the E(QMS) by the QMS and o% by s?, we receive the estimator of 0°. 


6.3.5 Three-Way Nested Classification 


For the three-way nested classification C < B < A, the following model equation 
is assumed: 


ijkl =H + A; + Dy + Cije + Cx 
; (6.64) 
(Slew mpelisbek =Lssepl alin): 
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The side conditions are that all random variables of the right-hand side of (6.64) 
have expectation 0 and are pairwise uncorrelated and var(a;) 
i, var (bj) = 07 for all i,j, var(cjx) = 02 for all i, j, kK and var(ejx)) = o° for all i, 


jpkL 


Because Theorem 5.12 is independent of the model, we find the SS, dfand MS of 
the three-way nested ANOVA in Table 5.27. For the calculation the E(MS) we need 


D2. No ES) Ne 
i j 

Fiy= Sony F;= SF ip 
k j 


Fj 


Ay = Pea, 
ij Nj: 


The E(MS) can be found in Table 6.9. By the ANOVA method, we gain the 


=>) 


N; 


F; 


following estimators for the variance components: 


8’ = MS res 

C..-B 
2 = -(MS in — MS res 
Ss, Naa, | & B ) 

B.-a Ay —Ag 
2 = MS. in - MS, — a 
°b ral Pas Ba 

EF F 
Ag-—= A3-—= 
2_ 4-1 N 2 N 2 
ae MS 4 - MS res 4 Ss. A Si, 
N 


The variances of these variance components can be found in Searle (1971) and 


will not repeat here for space considerations. 


Table 6.9 Expectations of the MS of a three-way nested classification for 


model Il. 


Source of variation 


Between the A-levels 


Between the B-levels 
within the A-levels 


Between the C-levels 


within the B- and A-levels 


Residual 


E(MS) 
A - A3-4 

eee 
A, -4 N-A 

2 +02 z a hae 3 
B.- B.-a 
N-A 

O+OCG 3 
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6.3.6 Three-Way Mixed Classification 
We consider the mixed classifications in Sections 5.4.3.1 and 5.4.3.2 for model II 
of the ANOVA. The model equation for the type (B < A) x C is 
Vig =H + ai + by + ey + (a,b), + (B,C); + CK 
kl i (4,D) x + (B,C) jx + Gj (6.65) 
(G=1,..,a@j=1,..,b;:k=1,..,¢1=1,...,1). 
The model equation for the type C < AB is 
Viger = H+ i + Bj + Cijx + (a,b); +egy (i= 15 Gj aly bk =1,..,,.61=1,.5n). 
(6.66) 
Again we assume that the random components of the right hand side of (6.65) 
and of (6.66) that have expectation zero are pairwise uncorrelated and have for 
all indices the same variances 
ar(aj) = 0%, var (bj) = ee [ var (B)) = 0; |, 
at(¢k) = 02 [ var (ej) = 02], var ((a,b)ix) = o2.| var ( (bse); ic) = cina| 
ar (ej) = 0° | var ((a.b),)) = 2] ’ 

The decomposition of the SS and df can be given in Sections 5.4.3.1 and 
5.4.3.2. To estimate the variance components by the ANOVA method, we need 
E(DQ). Following Rasch (1971) we obtained the type (B< A) x C 

E(MS 4) = 67 + no%,, 3, q + Ono, + NO, in g + OCNO? 
E(MSz in A)= o + 10%, nat cno;, ina 
E(MSc) = 0° + bno®, + abno® + 16}, in 4 
(6.67) 
E(MS 4c) = 0” + bn? + Nj. in a 
E(MSzc in A)= o + NO; nee 
E(MS x5) = 6” 
and for the type C< AB 
E(QMS,) = 0? + cno®,, + no? ,, ay + bcno? 
E(QMS,) = 0” + cno?, + No? ;. 4» + ACnoy, 
E(QMS- in AB) = o + no; in ab Z (6.68) 
E(QMS 42) = o + cno:, ab no: in ab 
E(QMS,-s) = 0° 
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By the ANOVA method, we obtain the estimators of the variance compo- 
nents, by replacing in (6.67) and (6.68), 0% by s? and E(MS,) by MS, and solve 
the equations for s?. 


6.4 Planning Experiments 


Systematic descriptions of designing the experiments for the one-way ANOVA 
and definitions of several optimality criteria gives Herrend6rfer (1976) on which 
the results of this section are based. We start with model equation (6.8) with its 
side conditions. Further, all random effects in (6.8) may be normally distributed. 
As estimators of o? and o”, we choose (6.12) and MS,,., respectively. We use 
the following notations: £7 = (62,07) and 3 = (s2,s”) with s? from (6.12) 
and s” = MS,,, from Table 5.2. 


Definition 6.6 The vector Vy = (4, 1, ...;Mq)n is called concrete experimen- 
tal design for the estimation of X, if2<a<N-1,;21,)°/_,ni=N where a and 
n; are integers. p> Vx = (a, Nj, ... ;Mq)n is called discrete experimental design for 
the estimation of 2, if 2<a<N-1,n;>1,)-/_,;=N where a and N are inte- 
gers, but the 1; may be real. With {Vj} and {oV,} we denote the set of possible 
concrete or discrete experimental design, respectively, for fixed N. 

We see that {Va} C {oVa}. 


Definition 6.7 An experimental design 9 Vx, € {o Vw }(Vxy € {Vnv}) is called 
discrete (concrete) A-optimal experimental design for given N, if for this exper- 
imental design 


1 
=a lvar(MSa)+ var(MSres)]+ var(MS es) 


2 
ni; 
i=l 


in the set {goVa}({Va}) is minimal. 


var(s7)+ var(s) 


with 


= 

ll 
t 

E _ 
PN 
- 
Zl - 
Ms 


Theorem 6.14 (Herrendérfer) 
The discrete A-optimal experimental design in {qVy} for estimating & must be 
found amongst the designs with equal subclass numbers (; = 7). 


Proof: The formulae (6.38) and (6.39) are initially defined only for natural a and 
n,{i=1, ...,a). For a discrete experimental design, we allow real 1; = 1. For fixed 
N and a the w is maximum for n;= N =f. Because n;=/ is minimising 
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1 
a var(MS,) (Hammersley, 1949) and var(MS,.,) are independent of the 
decomposition of N into ;, the theorem is proved because this is true for all 


pairs (a, N). 


Thus for the determination of a discrete A-optimal experimental design, 
the term 


(6.69) 


ON ast ae) aces 


1 [2(fo2 +02)? — 204 2o! 
7g 


must be minimised. Putting for p;= p, for p; in Definition 6.3, (6.69) becomes 


2 i=p\.. 4 5 
A(N,a) = 1 
(N®) faci" <) eg 


From Definition 6.3 and o” > 0, it follows always 0 < p< 1. 
Looking at the second partial derivation of A(N, a) with respect to a, we see 
that A(N,a) for 1<a<WN is convex from below and for 0<@<1 therefore 
dA(N,a) 
Oa 


A(N, a) has exactly one relative minimum. Putting equal to zero gives 


the two solutions 


w-1)(o+ “5? 


a, =1+ — (6.70) 
p+? -y3(1-p) 
and 
l-p 
(NE pe 
prea a w) (6.71) 
p+ +V2(1-p) 


But @, is not in the interval 0 < a < N - 1, and with this only the solution a in (6.71) 
is acceptable. If a is an integer and 2 < a < N, the A-optimal design is given by 
7 NINe + (1+ v2) -p)p] (6.72) 
Np+ (1 +NvV/2)(1-p) 


N 
and 7= —-—. 
a 
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Ifa’ < a) <a" witha" =a' +1 (a’,a" integer) and only a ora isin the interval 
[2,N], then the integer in this interval is a of the A-optimal discrete design. If 
both numbers a and a’ are in [2,N], we calculate for both A(N, a) and choose 
that one as optimal, for which A(N, a) is minimum. 

We find the concrete A-optimal experimental design by systematical search in 
the neighbourhood of the discrete A-optimal experimental design. By this sys- 
tematical search, we also vary a and of course unequal n; can occur. Theorems 
about optimal experimental designs to minimise the variance of a variance com- 
ponent (so called C-optimal designs) and the cost optimal choice of N can be 
found in Herrendorfer (1976). There and in Rasch et al. (2008), tables of optimal 
designs and experimental sizes are given. 


6.5 Exercises 


6.1 For testing performances of boars, offspring of boars under unique feeding 
fattening and slaughter performances are measured. From the results of 
such testing, two boars b, and bz have been randomly selected. For each 
boar, the offspring of several sows have been observed. As well as from ), 
and also from bg, three observations y (number of the fattening days from 
40 kg up to 110 kg) are available. The variance components for boars and 
sows (within boars) and within sows must be estimated. 

Table 6.10 shows the observations ;z. This case is a = 2, b; = 3, by = 3. 
The E(MS) are given in Table 6.6. 


Table 6.10 Data of Example 6.1. 


Number the Boars b, b> 
fattening days 

Vike Sows Sy So S3 S4 Ss S6 
Offspring Vijk 93 107 109 89 87 81 
89 99 107 102 91 83 
97 94, 104 82 85 
105 106 97 91 
ny 4 2 4 4 3 4 

10 11 


6.2 Determine the A-optimal experimental design by (6.71) for N = 200 and 
p =0.5. 

6.3 Add in Example 6.1 for boar 5 the missing value by the corresponding 
mean (2 decimal places) and add for boar 2 the mean twice. Estimate 
the variance components for the new data set D. 
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6.4 Derive 
E(MS,) = bno?, + no? + 0” 
E(MS3) = ano; + no?, + o* 
E(MS az) = No? +o" 
E(MS yes) = o 
using the rule of Chapter 7. 
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Analysis of Variance - Models with Finite Level 
Populations and Mixed Models 


Inthe present chapter we consider models with factor levels from a finite population 
of factor levels, or as we call them in short ‘finite level populations’. Covering at least 
in the case of equal subclass numbers, model I and model II of the analysis of var- 
iance (ANOVA) as special cases. Even the mixed models, also introduced in this 
chapter, are limiting cases of models with finite level populations. 

In mixed models as well, problems of variance component estimation and also 
of estimating and testing fixed effects occur. In Section 7.3, some special meth- 
ods are presented, which are demonstrated for some special cases in Section 7.4. 


7.1 Introduction: Models with Finite Level Populations 


Models with finite level populations are of interest because we meet practical 
situations where the selection of factor levels covers a finite number of levels 
but not all levels in a population with a finite number of levels and further 
because other models are special or limiting cases of such models. 


Definition 7.1 Let the elements y Aes (j=1,...,ax) of the vectors y,, in model 
equation (6.3) be a, random variables. The realisations of those a, random vari- 
ables are a; effects, sampled (without replacement) from a population of N(A;) 
effects. Then we call the model equation (6.3) (under the side conditions that the 
effects in the populations sum up to zero) an ANOVA model with finite level 
populations. If (6.43) holds, we speak about a balanced case of the model with 
finite level populations. 

This means we assume that the a, effects in an experiment are selected 
randomly from a level population with N(A,) = a, effects and that each level 
can be selected only once. If N(A,) = dm all levels are selected and the factor 
A, is a fixed factor in a model I. If N(A;)—00, then the factor A, is a random 
factor of model II. 
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In the balanced case we present in Section 7.2 simple rules for the derivation 
of SS, df, MS and E(MS). 

For simplicity we call an n-dimensional random variable with identically 
distributed (but not independent) components a type 2 random sample as it 
is usual in the theory of sampling from finite populations. 

In the theory of sampling from finite populations, the variance in the popu- 
lation is defined as the sum of squared deviations from the expectation divided 
by the population size N. But in the ANOVA, we use for simplification of 
further formulae quasi-variance by dividing the sum of squared deviations 
from the expectation by N - 1. We denote the quasi-variances by 0”, 0% and 
so on but the real variances by o*”,o*”. The conversion of varainces into 
quasi-variances and vice versa is demonstrated by the example below. 


Example 7.1 Let the cross-classified factors A and B be nested in the factor C. 
Then the variance component of the interaction A x B is 
2 N(A)-N(B) “2 


Sabine = (N(A)-1)(N(B)-1) "© (7.1) 


where 


De, pay (ab); 
abine™ (N(A)—1)(N(B)-1) 


Example 7.2 We consider a model with finite level populations and two 
factors and a factor level combination, where A, =A, A> =B,A3=A x B, a, = a, 
d2=b and a3=ab and R stands for the residual. The model equation for the 
balanced case is 


for all k. (7.2) 


ijk =H + ai + bj + (ab), + eK (i= 1. nGJ=L... k= 1.0). (7.3) 


The side conditions are 


N(A) N(B) N(A) N(B) N(R) 
n<N(R), Yai= > bi= 5 _(ab)y= S_(ab)y= Y eqn =0, 
i=1 j=l i=l j=l k=1 
1 XA uch qe : 
N(A)-1 24 ~ Car Nw) -1dyh me 
‘ N(A)N(B) 
ab “=o , ej =o 
[N(A)-1][N(B)-1] 2.24| N= Oat weal a 


for all i and j not at the bounds of the sigma signs. 
We can derive the following expectations after inserting the right-hand side 
of the model equation for y in the E(MS). 
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With the notations of Table 5.13, this results in 


E(MS,) = (1 xp)” n(1 nw) +nbeo?, 
E(MSz) = (1- xm) 0 + n(1- nua) o2, +nao?, ey 


E(MS,z) = (1 - nw) 0? +n0?,. 


E(MS yes) = 0” 
If N(R)— oo, N(B)— oo and N(A)— 00, we obtain the E(MS) of model II in (6.53). 
For N(R)— «,a=N(A) and b= N(B), we obtain the E(MS) of model I in 
Table 5.13. For N(R)— oo, N(B)— 00 and a = N(A), we get the model of Example 
7.2, a mixed model (A fixed, B random), and we receive 


E(MS,) = 0° + no, + nbo?, 


E(MSp o + nao; 
(7.5) 


)= 
E(MS az) = o + no*, 
\= 


E(M. S res 


o 


In (7.5) we put o? = ts Gi but this is no variance. 


Example 7.2 shows the potential of models with finite level populations. In the 
balanced case simple rules for the calculation of the E(MS) exist. 


7.2 Rules for the Derivation of SS, df, MS and E(MS) 
in Balanced ANOVA Models 


In Chapters 5 and 6, we could see that the derivation of E(MS) even in simple 
cases is elaborate. Now we give rules by which formulae for SS, df, MS and 
E(MS) for a balanced case can be easily derived. 

Let us consider t factors A,; k=1, ..., inan ANOVA with the size of the level 
populations N(a,) and the number of selected levels a,. (If there are few factors, 
we rename A, = A, A> = B, A3 = C.) Ifa factor Ax, is subordinated to a factor Ax, 
we write Ax, < Ax,. The indices of the effects in the model equations are split 
into two groups. The indices in any suffix of subordinated (nested) factors 
are given first; then the indices of the superordinate factors or factor combina- 
tions follow in a bracket such as ex, {) or ey, ;, 4 In the ANOVA table for each 
factor (including residual), a row exists. Further, there are rows for factor com- 
binations (interactions). If a factor X is not subordinated to any factor, we 
write X <. 
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Rule 1 Interactions between two factors or factor combinations are obtained 
formally by symbolic multiplication of the factors (or factor combinations) both 
left of the < sign and right of the < sign. If the same letter occurs, the right of the 
< sign more than once, it will be noted only once (X - X = X). An interaction is 
not defined if the same letter occurs at both sides of the < sign. 

Rule 2 The degrees of freedom in a row are obtained, reducing the number of 
occurring levels of the A; left of the < sign by 1 and multiplying with analogously 
reduced numbers of other factors left of the < sign as well as with the number of 
the selected levels of factors right of the < sign. 

Rule 3 The SS in a row are obtained by performing a product of the S4, in 
(6.45) analogue to the product of the degrees of freedom, which means 
that an A; left of the < sign gives a factor S4, —e, in the product, but the right 
of the < sign gives Aza factor S4, in the product. The error term e is the identity 
element of this multiplication (S4,e=eS4, =S4,) and defined by e= nY. 
Further as a result of that symbolic multiplication, S,, Sa, is to read as S4,a; 
and Sp_a,,..,4, aS Sp. 

Rule 4 The E(MS) are calculated as follows: define a table with rows defined 
by the components (except j) of the right-hand side of the model equation. The 
columns correspond with the indices in the suffix. If in a cell of the table the 
index defining the column does not occur in the effect defining the row, we fill 
the cell with the number of selected levels of the factor defining the column. If 
the index defining the column occurs in the bracket of the row effect, we put a 1 
into the corresponding cell of the table; otherwise we put there 


number of selected levels of the column 


number of levels of the column 


Now each E(MS) is written as a linear combination of o* and all the 
variance components whose suffixes contain the upper bounds of those 
indices occurring in the effect corresponding to the MS in front of the 
bracket. The coefficients of the linear combination are generated in that 
row of the table corresponding with the variance component by multiplica- 
tion of the contents of those cells in that row defined by a column suffix not 
in the bracket of the effect defining the MS. Finally, we convert as shown in 
Example 7.1. 


Example 7.3. Given a two-way cross-classification with finite level populations 
as in Example 7.2. At first, the model equation (7.3) is assumed. The SS, dfand MS 
of Table 5.13 as well as the E(MS) are to state according to the rules of this section. 
The ANOVA table has to contain the rows for A, B, AB and residual. At first, we 
have to put the indices of superordinate factors in brackets. Because only the 
error terms are subordinated to all factors, (7.3) becomes 


Vij =H + a + bj + (ab); + ex(i,))- 
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Only one interaction exists, (A<)(B<) = AB<. The degrees of freedom following 
rule 2 (res and R means residual) 


A< a-l, 

B< : b-1, 
AB< : (a-1)(b-1), 
R<AB : (n-1)ab 


The SS following rule 3 are 


YOY; Y 


A< :8§ = : 
aan <— bn N 


Ye eee ye? 


B ; oe j a 
i. Sp N Bes N 


a b y2 a 


b , 
AB < : (S4~e)(Sp—€) = San-Sa-Sp+e= as : 3S ys N° 


i=1lj= ged 


R<AB: (Sp-e)S4S8z = Spaz -Sap = PPIoe yy 


i=lj=lk=1 i=1lj= 


Then SS7 is the sum of all S: 


SSr = Pei 


i=1 j=lLk=1 


To determine the E(MS) following rule 4, we first construct the table 
defined there: 


i j k 
Gj ete b n 
N(A) 
b a b n 
N(B) 
(ab); a = 1) N 
N(A) N(B) 
ext 1 1 ae 
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The column subscript i of the first column does not occur in f;, so that in the 
first cell in the second row, a is placed. Because j is not in a;, b is placed in the 
second cell of the first row. k does not occur in a;, b; and (ab) ,; therefore the first 
three cells in the third column contain n. The indices i and j are for e,;;, ) in the 
bracket; therefore 1 stands in the first two cells of the last row. In all other 


cells of the first column, we put 1- Nay in the still free cells of the second col- 


b 
umn, we put 1- N@) and in the free cell of the last column, we put 1- 
Now, following rule 4, 


n 
N(res)” 


2 2 2 
E(MS4) = c107, + C20, + C30 


with 


and 
E(MSz3) = C407, + C5025 +060" 
with 
C4=an, Cs=n\( 1 . Co=1 = 
apa ee N(A))’ © N(res) 
and 
E(MS 43) = C70%, + C80" 
with 
C7 =n, Cg=l1 a 
pon’ 8" N(res) 


and finally 


E(MS es) = 0°. 


Example 7.4 We determine df, SS and E(MS) for the ANOVA of type C < AB 
in Section 5.4.3.2. We write model equation (5.48) as 


Vijki Hh+G + b a Ck(i,i) # (ab); a e1(i,j,k) 
(G=1,..,,.a@j=1,..,Djk=1,...,,.¢1=1,...,n). 
In the ANOVA table, we have rows for A <, B <, C < AB, AB < and res. 
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The degrees of freedom are (rule 2) 


A< 

B< 

C<AB 
AB< 


R~<CAB : 


The SS (rule 3) are 


A< 


a-l, 
: b-1, 
(c-1)ab, 
(a-1)(b-1), 
(n-1)abe. 
SS4=S teat ys 
met ied = ben NO 
b y2 1 
BEL ee WE eo? 
SSp = Sp e Day Vue 
SSc in AB = (Sc- a Scag - ee 


yey ye 


i=l j=lk=1 
SSpr = ree = Sr-Sasc 


>>I a DDR 


Following rule 4 we construct the table: 


a; 


b; a 


CK(i, j) 


(ab); 


e1(i, j,k) 


i=lj=1lk=1l=1 is lj=lk=1 
J k | 
ee b c n 
N(A) 
b c n 
N(B) 
1 {ia== n 
N(C) 
a b c n 
N(A) N(B) 
1 1 ae: 
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Then we get 


E(MS,) = beno® rn(1- . 5) azine +en( 1- 


N(C) 
sc) (a 


E(MScina)= Menus * (1 574 NG 5) 


E(MSz) = acno; +n| 1- 


= a 2 ele 33 2 
E(MSz,z) =n(1 NIG) “nt +cno?), + (1 now)? ; 


E(MS yes) = 


7.3 Variance Component Estimators in Mixed Models 


Mixed models in the ANOVA are such models where in Equation (6.3) at least 
one but not all y,, are random variables. More general mixed models are defined 
so that models I and II are special cases. We use this general rule, but neverthe- 
less find it reasonable to consider model I and model II in separate chapters 
as done in Chapters 5 and 6 and to use the methods developed for mixed 
models only if neither model I nor model II can be used. 


Definition 7.2 Let Y=(yj, ..., yn)" be an N-dimensional random vector 
depending on the effects 74,,...,745%4,,,9-07A4, Of r factors or factor combina- 
tions A; (i=1, ...,r) with a; levels as 


Y=plyt+ yz Ya, + ‘ ZA A, +e (7.6) 


= i=s+1 


Equation (7.6) under the side conditions 
var(e) =o" Ey, E(e)=Oy, cov(y4,€) =Og,n (i=S41,....7); 
cov (ra,174,) = Oa,,a;(i,j = s41,..u514)), E(ya,) =); 


is called a mixed model of the ANOVA. 
Inserting in Definition 7.2, 


T T 
B,= Gio) ps= (Hejteth) ; 
XxX, = (ln, Zajs---s ZA, ) and Xo = (Lis iets La) 
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then (7.6) becomes 
Y =X1f, + Xof. +e (7.7) 


analogue to (5.1) and (6.1). For X; = X, By = B(X2 = 0) (7.7) becomes equation 
(5.1), and for #, = 4,X1 = 1n,X2 = Z, PB =¥ (7.7) becomes Equation (6.2). Hereaf- 
ter we are interested in the so-called real mixed models where /; except p con- 
tains at least one further component and X2 and /, are not zero. New with the 
real mixed models is the fact that fixed effects are estimated or tested and var- 
iance components are estimated. 

Do not confuse mixed models with mixed classifications. 


7.3.1 An Example for the Balanced Case 


Example 7.5 (Mixed model in the two-way cross-classification with equal 
subclass numbers) 

Two cross-classified factors A (fixed) and B and their interactions AB in 
Equation (7.6) lead us to 


A, =A, Ap =B, A3 =AB. 
Then s = 1 and r = 3 and we put a, =4,d,=b, and consequently a3 = ab. 
Because we have equal subclass numbers with n>1, it follows N= abn. 
Equations (7.6) and (7.7) become 

Vin =H + a; + Bj + (ab); + ex. (7.8) 


From the side conditions of Definition 7.2, we get side conditions for (7.8). Let 
additionally (case I) 


var(b;) =o% for all j, cov(b;, by) =0, for all j,k with ; 4k, 
j b J j 
var ((ab);) =o2, foralli,j, S a;=0, 
i=l 
cov (6, (ab),;) = cov(by,ejx) = cov ( (ab), ej) =0 
and 
cov ((ab),,(ab),) =0 J #7). 
The columns SS, dfand MS in the corresponding ANOVA table are model inde- 


pendent and given in Table 5.13. The expectations of the MS for model (7.8) can 
be found in Table 7.1. 
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Table 7.1 Expectations of the MS in Table 5.13 for a mixed model (levels of A fixed) for two side 
conditions. 


Source of 
variation E(Ms), if cov ((ab),, (ab), ;) =0(Casel) E(MS), if )~7_, (ab);,=0 for all j (Case Il) 


Between the phy ~ bn 


levels of A CA a (a;-a)” +no2, +0" ar a; + et +o 
Between the ano}, + no?, +07 ano, + 0° 
levels of B 
Interactions 02, + 0? oe 
AxB a-1” 
Residual o o 
If the additional side conditions (case II) are 
a 
S- (ab) ;,=0 for all j (7.9) 


i=1 


the term @=+)~/_,a; vanishes and (ab), and (ab) (GA isj=1,...b) are 


correlated; the covariance is cov ( (ab) (ab) “) = 04h for all j andiF¥/’. 
Because var (3°71 (ab),) = var(0)=0 O= var (ci. (ab);) =v aor + 


1 of 1 Cab = 40%} + a(a-1)o4p, the relations and ogy = - — oi} follow. 
if! 

The conditions (7.9) lead to the E(MS) in the last column in Table 7.1. Searle 
(1971) clearly recorded the relations between the two cases. He showed that 
o; in the two cases changed in meaning. To show this, we write down 
Equation (7.8) separately for both cases: (7.8) as it stands for case I and (7.8) with 
effects complemented by ” for case II (side conditions (7.9)) 


zy ft " 7 " , 
Vije =H +4; +B; + (ab); + Gjjk. 


(7.8) can be written as 
Vije =H + a; + Bj + (ab) ; + (ab) ;- (ab) ; + Ci 


with (ab) ,=a~')77_,(ab),. Then we obtain "=, + d,a/ =a;—a,b/ = bj-b 
and (ab); = (ab) ;,— (ab) ;. Then we have 


1 a-l 
2 2 2 2 2 
Op =Op =O4= qo nb = Ogiy = ars Cab 
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and 


1 iy 
Oab = cov ( (ab) (ab);;) = ra = Oat" 


The variance components 07,07, and o” can be estimated due to balancedness. 
For case I from column 2 in Table 7.1 by the ANOVA method, 


s° = MSjes, 
1 
ie = 
Sy = 7 (MSs MS yes); (7.10) 
1 
sz =—(MSp-MS,3). 
an 
For case II from the last column in Table 7.1, 


2 
s°=MS,,s; 


a __ 4-1 a 
Sab = ae (MS 4p MS yes); (7.10a) 


1 
5 = a (MS3 -M,,s). 


Analolously to this example, cross-classified, nested or mixed-classified bal- 
anced designs of the mixed model can be treated; the general statements of 
Section 6.3 are still valid. 


7.3.2 The Unbalanced Case 


In unbalanced cases we use Hendersons method III (in Henderson, 1953), start- 
ing with the model equation (7.7). A quadratic form in Y must be found so that 
its expectation independent of /, contains only the variance components we are 
looking for, if the covariances between the random effects of each factor 
are zero. 


Theorem 7.1 For Y let the mixed model in Definition 7.2 have the form (7.7). 
With X = (X1, X2), the expectation of the quadratic form 


Y"[X(X'X) X7-Xi(X{X1) X7|¥=Y"(u-V)Y 


depends only on the unknown o” and on var(62), but not on /; even if B, is 
random. 
Proof: We rewrite (6.5) for Y in (7.7) as 


E(Y7AY) =tr[X"AXE(BB")] +0°tr(A). (7.11) 
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with P= ) (E(B) =f). Due to the idempotence of X(X7 X)-X7 and 


x" KOTO XEX = X" X, we get 

E(YTUY) =E[Y'X(X7X) XTY] =tr[(X7X)E(6B")] + 0°rk(X) 
and 

EQ? VY) =B[Y? (Xf 7 4) 2D] 

=tr[X7X1(X7X1) XP XE(BB")] + 07 rk(X). 

We write 

Mies & X, x} :) 

XP X, XT X 

with (X;, Xz) =X and obtain 


E(Y* UY) =t 
XJ X XP X 


or because X7X(X" X)_X7 = X", 


KOR APG 
E(6B") | + 0°rk(X) (7412) 


Xi X 
E(¥Y'VY) =tr (X71 Xi) (XTX, XPX2)E (BA) | + 0°rk(X) 
Xs Xy 


XE Xe KEK 
=tr E(6p") + o°rk(X1). 
XP XX Xi (XIX) XTX 


1 


With p= & we obtain 


E{Y™[X(X?X) X7-X\(X7X1) XP] Y} 


: (7.13) 
= tr{X} [In -X1 (XP Xi) XP] X2E (Byhz) } + 0? [rk(X) -rk(X)], 


and this completes the proof. 


Hendersons method III uses Theorem 7.1. 

The partitioning of # into two vector components 31, By (and X in Xj, X2) is 
independent of /, containing only fixed effects or not. Theorem 7.1 is valid for 
all partitioning of f, as long as rk(X) — rk(X,) > 0. 
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We now build for a mixed model with r — s random components all quadratic 
forms of type Y'(U- WY, in which Bz has one, two, ...,.7-s random groups 
of elements and by this X2 one, two, ...,.7-s groups of columns. Together 
with E(MS,,,)=0°, the expectations of these quadratic forms result in 
r—s+1 equations with the variance components being obtained, as long as 
E (8,85 ) is a diagonal matrix. 

In these equations we replace E [y7(u-V)Y] by Y7"(U - V)Y and the variance 
components o?(o”) with their estimators s?(s”) and receive equations with the 
estimators of the variance components as unknown quantities. The estimates 
can become negative, but the estimators are unbiased and independent of 
the fixed effects. Note that due to the unbiasedness, we can get negative esti- 
mates that of course are nonsense. If we replace the negative estimates by zero, 
the unbiasedness property is no longer true. 

In mixed models, it may happen that variance components of the random 
effects as well as the fixed effects have to be estimated. If the distribution of 
Y is normal, we can use the maximum likelihood method. The likelihood func- 
tion is differentiated concerning the fixed effects and the variance components. 
The derivatives are set at zero, and we get simultaneous equations that can be 
solved by iteration. The formulae and proposals for numerical solutions of the 
simultaneous equations are given in Hartley and Rao (1967). The numerical 
solution is elaborated (see the remarks about REML in Section 6.2.1.2). 


7.4 Tests for Fixed Effects and Variance Components 


We assume now that all random entries in (7.6) are normally distributed and 
that the side conditions (7.9) are defined so that all fixed effects are estimable. 
W.Lo.g. we restrict ourselves toi = 1 andi=s+1 


Hor :¥4,=0a, against Har: ¥4; A0a1 (7.14) 
and 

Hoy 02, , = 0 against Hay :o7,, £0. 
By 

SS;=Y'TY (i=1,...r+1) (7.15) 


we denote the SS of factor A; (SS, =SS,,¢;) where the T; are idempotent 
matrices of rank rk(T;) =f. In the special cases of Section 7.5, SS; and f; are 
given in the ANOVA tables. The magnitudes 

1 
Si 
are the corresponding MS, MS,.,1 = MSyes, fe +1 =Sres- 


MS; = SS; (i= 1,...,7+ 1) (7.16) 
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To test Hor we construct a test statistic 


Y'’TN1Y MS 
F,= r+1 - = r+1 - (7.17) 
A ies 1 MS; ies 1 MS; 


so that if Hor is true for suitable c,, 


r+1 
El oats 


i=s+1 


E(MS\|Hor) = 


A corresponding construction is used if the test statistic for Hoy is given by 


MS; 4.1 
r+1 


j=s+2 


Fy.1= (7.18) 


where we have to choose the k; so that 


rt+1 
E(MS,,1\Hov) = e| S> ws) 


j=st+2 


The degrees of freedom of the test statistics (7.17) and (7.18) in all cases, in 
which only one of ¢; (i=s+1, ....7+ 1) ork (j=5+ 2, ...,7+ 1) differs from zero 
and is equal to 1, are given by (fis fi) or Uf, nf). In all other cases we approx- 
imate the degrees of freedom by (fi, f) or (fi+1, fy) using the corollary of 
Lemma 6.3. For testing Hog and Hoy, Seifert (1980, 1981) used another approach 
leading to mixed models to exact a-tests and in balanced cases to simple for- 
mulae. The principle of Seifert was to use test statistics that are ratios of 
two independent quadratic forms Y7B,Y and Y’B,Y where Y’BoY is centrally 
7’-distributed with g. degrees of freedom and Y’B,Y — if Ho (Hp v) is true — 
is centrally y’-distributed with g, degrees of freedom. Then 


_Y™BYg 
— Y'™BYg 


(7.19) 


— if Ho ¢ (Ho vy) is true — is F(g,, g) centrally F-distributed. This procedure for 
model I and model II may also be used (s = 0, r = 0)- 


7.5 Variance Component Estimation and Tests 
of Hypotheses in Special Mixed Models 


Below we discuss simple cases (mainly balanced designs) of the two- and three- 
way analyses with mixed models. If we discuss statistical tests of fixed effects or 
variance components, we assume that all random variables in (7.6) are normally 
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distributed and that side conditions are fulfilled so that all fixed effects are 
estimable. 


7.5.1 Two-Way Cross-Classification 


W.l.o.g. we assume that in a mixed model of the two-way cross-classification, 
factor A is fixed, and factor B is random as in model equation (7.8). The variance 
component estimation already was handled in Example 7.5. 

The SS, MS and df are given in Table 5.10; there we have single subclass 
numbers and R = AB. We haves = 1, r=2, a, =4, a) =b, a3 =n, and the model 
equation is (7.8) for (ab), = 0- 

The null hypotheses that can be tested are 


Ho: ‘All a; are zero’. 
a 
Ho: ‘0%, = 0’. 


ities 
Ho3: Oub = . 


If Ho; is true, then E(MS,) equals E(MS,,), and we test Ho; using 
MS, 
MS 43 , 


which has under Ho; an F-distribution with (a — 1) and (a — 1)(b - 1) degrees of 
freedom. 

To find the minimum size of the experiment that will satisfy given precision 
requirements, we must remember that only the degrees of freedom of the corre- 
sponding F-statistic influence the power of the test and by this the size of the 
experiment. To test the hypothesis Ho; that the fixed factor has no influence 
on the observations, we have (a — 1) and (a - 1)(b - 1) degrees of freedom of numer- 
ator and denominator, respectively. Thus the subclass number 7 does not influ- 
ence the size needed and therefore should be chosen as small as possible. If we 
know that there are no interactions, we choose n = 1, but if interactions may occur 
we choose n = 2. Because the number a of levels of the factor under test is fixed, we 
can only choose J, the size of the sample of B-levels to fulfil precision requirements. 

Here is an example. 


Fa= 


Example 7.6 We want to test the null hypothesis that six wheat varieties do 
not differ in their yields. 

For the precision requirements a = 0.05, f = 0.2, o =1, 6 =1.6, we receive a 
maximin number of levels of factor B as b = 12. 

For the experiment we randomly selected 12 farms. The varieties are the levels 
of a fixed factor A and the twelve farms are levels of a random factor B. Both are 
cross-classified. The yield in dt/ha was measured. The results are shown in 
Table 7.2. 
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Table 7.2 Yields of 6 varieties tested on 12 farms. 


A: varieties 
B: farms 1 2 3 4 5 6 
1 32 48 25 33 48 29 
2 28 52 25 38 27 27 
3 30 47 34 44, 38 31 
4 44 55 28 39 21 31 
5 43 53 26 38 30 33 
6 48 57 33 37 36 26 
7 42 64 40 53 38 27 
8 42 64 42 41 29 33 
9 39 64 47 47 23 32 
10 44, 59 34 54 33 31 
11 40 58 27 50 36 30 
12 42 57 32 46 36 35 


We can perform the two-way ANOVA with SPSS by using the procedure: 


Analyze 
General Linear Model 
Univariate 


We input ‘variety’ as fixed factors, ‘farm’ as random factor and ‘yield’ as the 
dependent variable as shown in Figure 7.1. With the model key, we use a model 
without interactions. The results of the SPSS calculations are shown in 
Table 7.3. Note that because we have single-cell observation, the residual is 
equal to the interaction AB. 

We received F 4 = ae = 1139.247/33.144 = 34.372, and therefore we found 
significant differences between the varieties. This follows directly from 
Table 7.3 because sig. < 0.05. For Fz we receive Fz = 2.007,and this means that 
the variance component for the farms is with a = 0.05, not significantly different 
from the error variance. 

To estimate the variance component for the farms, we use in SPSS: 


Analyze 
General Linear Models 
Variance Components 


Again we use a model without interactions and select the ANOVA method. The 
result shows Table 7.4. 
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€B Univariate x 


Dependent Variable: 
>) (Model) 
(Contrasts. 


Fixed Factor(s): 


Covariate(s). 
> 

WLS Weight 
> 


Ss 
(Lox _}| Baste || eset | cancel) Help | 


Figure 7.1 The factors of Example 7.6. Source: Reproduced with permission of IBM. 


Table 7.3 ANOVA table for Example 7.6. 


Tests of between-subjects effects 


Dependent variable:yield 


Type Ill sum of Mean 
Source squares df square F Sig. 
Intercept Hypothesis 110842.014 1 = 110842.014 1666.070  .000 
Error 731.819 11 66.5297 
Variety Hypothesis 5696.236 5 1139.247 34.372  .000 
Error 1822.931 55 33.144? 
Farm Hypothesis 731.819 11 66.529 2.007 .045 
Error 1822.931 55 33.144? 
“MS (farm) 
’MS(error) 


Source: Reproduced with permission of IBM. 
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Table 7.4 Variance component estimates of Example 7.6. 


Variance estimates 


Component Estimate 
Var(farm) 5.564 
Var(error) 33.144 


Dependent variable: yield method: ANOVA (type I sum of squares) 


Source: Reproduced with permission of IBM. 


7.5.2 Two-Way Nested Classification B <A 


Let the model equation of the nested classification be 
Vie =H AG t+ by tenis (C= 1. .Gj=1,..b:k = 1....1) (7.20) 


with the side conditions that the levels of A,B and the residuals stem from finite 
level populations and for all i and i, j, we assume 


1 N(B) i. N(res) 
-E b — b = 
(ba) (B 24° ~ Nitra) D. eK) 

N(A 

! og, (7.21) 

NCA) eee se 

® 
waa =O; in a’ 
N(res) 


N(ves) Nessie CK) = 


Further, all covariances between the components in (7.14) (e.g. between a; and 
bj) shall be zero. We use the rules in Section 7.2 to generate the ANOVA table. 
This ANOVA table has three rows: levels of A, levels of B in A, and residual. The 
ANOVA table is Table 7.5. By rule 2 the degrees of freedom have been found 
and by rule 3, the SS. 
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Table 7.5 ANOVA table of a two-way balanced nested classification for a model with finite level 
populations. 


Source of 
variation SS df MS E(MS) 
ay y SS b 
Between the SS, = a apr > a-1 = = MS, bno? +n(1-555) Oxina 
levels of A i= UN GOR a= 
+ es o 
N(res) 
LAREN Gy SSpina n 
Between the SSzgina = > r a(b - 1) = NOz ing + (:- news) o 
levels of B je1jo1 ? a(b-1) N(res) 
within A 4. y? = MSzina 
“2 bn 
a bon 
Residual SSrs= > Y Yje b(t 1) Sree _ any oa (1 _ nos) 2 
i=l j=1k=1 ab(n-1) (res) 
ab y2 
7 Sy 
i=ljei 7 
We use rule 4 to determine the E(MS) using the table below: 
i j k 
G; = id b nN 
N(A) 
7.22 
b iti) 1 bo n ( ) 
N(B) 
N 
“abs ; : NG) 


Then E(MS) are 


n 
E(MS gina) = Ning + (1 iy) 
E(MS yes) = joa | 
N(res) 
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If N(res) tends to infinity (co), then we receive 


b 
= 2 Peak Ce 2 
E(MS,4)= bno,+ n(1 NH) ne + oO, 


E(MSz) = NOeing + Os 
E(MS yes) = 0. 


(7.24) 


Putting in (7.24) N(B) = b, we obtain the E(MS) of model I in Table 5.19 for the 
balanced case (n; =n, b; = b). For N(B)—00 we obtain from (7.24) the E(MS) of 
model II in Table 6.5 for the balanced case (with a corresponding definition of 
o and o; ,, ,). Nevertheless, here we are interested in mixed models. In the 
nested classification, there exist two mixed models. 


7.5.2.1 Levels of A Random 
Let the levels of A be randomly selected from the level population and the levels 
of B fixed, the model equation is then 


Vie =H ai + by + ea; (= Li wGj= lL. Bk= 1.57) (7.25) 


with corresponding side conditions. 
(Expectations of all random variables are zero, var(a;)=o7 for all i; 
var (exii, )) = o° for all i, j, k; all covariances between different random variables 


on the right-hand side of (7.25) are zero, ye yi) = 9.) 
The E(MS) we get from (7.24) for N(B) = b is given in column 2 of Table 7.6. 
The estimators of the variance components are 


1 
8° = MS js, 8° = = (MS, -MS 2s). (7.26) 


Table 7.6 E(MS) of mixed models of the two-way nested classification. 


Source of variation A random, B fixed A fixed, B random 
vi 2 bn = 2 2 
Between the levels of A bnov +0 ati +NOping + © 
i=1 
Between the levels of B within the n ab ‘ 5 NO-Lina +0 
levels of A » Diy +o 


Residual oO o 
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7.5.2.2 Levels of B Random 
If the levels of A are fixed and those of B random, the model equation is 


ijk =H + ai + Diy + €k(i,/), (i=1,...,4j=1,...,.b;k =1,...,n) (7.27) 
with the side conditions 


var (bj(i)) =0;;,,, for all i,j, 


var (€x(i,i)) =o" for all i,j,k, 
Sai = 0, 
i=1 


and the expectations of the random variables on the right-hand side of (7.27), 
and all covariances between the random variables on the right-hand side of 
(7.27) are equal to zero. 

The E(MS) for this case follows from (7.24) and can be found in the last col- 
umn of Table 7.6. The estimators of o” and o7,,,, are 


1 
8 =MSyes) Shing = | (MSaina ~ MS res): (7.28) 
The null hypothesis that the effects of all the levels of factor A are equal is tested 
using the test statistic: 
MS 4 
MSpina’ 


which under Ho has an F-distribution with (a - 1) and a(b - 1) degrees of 
freedom. 
For the mixed model we use conditions analogue to case II of Spee 7.3.1 


F4= 


but corresponding to the remarks below. In Example 7.1 in place of => oy 
we write the quasi-variance component o?,,- Then we get 


E(MS,) = a; +No, +07, 
ab 


E(MSz) = a + E(MSag) =n0%, + 0°. 


To find the minimum size of the experiment, which will satisfy given precision 
requirements, we have to find the minimum number b of levels of factor B. 
Because in nested models we have no interactions, we can fix n = 1. Consider 
the following example. 


Example 7.7 It shall be tested whether an amino acid supplementation in 
the rearing rations of young boars (7 months old) causes a significant 
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increase in sperm production (total number of spermatozoa per ejaculate in 
Mrd.) of the boars. There were a = 2 feeding groups (with and without sup- 
plementation) formed; in each group b; =b,=b randomly selected boars 
from an animal population have been investigated. From each boar sperma- 
tozoa from c ejaculates have been counted. This is a two-way nested classi- 
fication with factor feeding (fixed) and boar (random). In the SPSS procedure 
it reads: 


Analyze 
General Linear Model 
Univariate 


We click the model button, and then under ‘build terms’ we enter both factors as 
‘main effects’; under ‘sum of squares’ we choose ‘type I’ After clicking ‘continue’, 
click on ‘paste’, and in the syntax window change ‘/design feeding boar’ to 
‘/design feeding boar(feeding)’ (signifying boar nested in feeding). The new 
syntax is 


DATASET ACTIVATE DataSetl. 
UNIANOVA Yield BY Feeding Boar 
/RANDOM=Boar 
/METHOD=SSTYPE (1) 
/ INTERCEPT=EXCLUDE 
/CRITERIA=ALPHA (0.05) 
/DESIGN=Feeding Boar (Feeding) . 


Then start the programme with ‘Run’. 


7.5.3 Three-Way Cross-Classification 


To calculate the variance components, we must calculate as in Example 7.4 the 
quasi-variance components; we call them variance components for short. 

The reader corresponding to the procedure in the sections above can 
derive the side conditions of the models. We only give the E(MS) for a 
model with finite level populations and for both types of balanced mixed 
models. In the unbalanced case the method Henderson III with two random 
factors does not lead to a unique solution; this case is not included in 
this text. We recommend in those situations to use the REML method in 
Section 6.2.1.2. 

The model for finite level populations is 


Yijet = + Gi + b + CK + (ab) ; + (ac) x + (be) x Bs (abe) i + CuK (7 29) 
(i=1,....4;7=1,...,b;k =1,...,1=1,....1), | 


the sums of overall effects of the level populations are assumed to be zero. 
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The SS, dfand MS of the three-way cross-classification are given in Table 5.21. 
To derive the E(MS) for model equation (7.29), we first use the table below: 


i j k i 
a ie—* b c n 
: N(A) 
b 
b; a NG) Cc n 
c 
CK a b N(C) n 
a b 
1- Sy 
(ab); N(A) N(B) c n 
a c 
(ac)ix A (A) b => N(C) n 
b Cc 
; a 1-—_ 
(be) ;x a 1 (B) N(C) nN 
a b Cc 
7 1- ees 1-——~ 
(abo)in N(A) 1B) N(C) " 
n 
Cui, i, k) 1 1 1 1- 
J N (res) 


If N(res)—00, we obtain the E(MS) of the second column in Table 7.7. In the 
three-way cross-classification exist two types of mixed models. In the first type 
the levels of one factor (we choose w.l.o.g. factor C) are randomly selected. In the 
second type the levels of two factors (we choose w.lo.g. factors B and C) are 
randomly selected. 

The model equation of the first type (A, B fixed, C random) is 


Vijet =H + a; + B; + CK + (ab); + (dc), + (bc) x. + (abc) + €i(i,),k) 
G=1,...Gj=1L..,bk=1,..,G1=1,...,n). 


For N(A) =a, N(B)=b and N(C)—o, we receive the E(MS) for the model 
equation (7.30) in the third column of Table 7.7. From this the estimators of 
the variance components become 


(7.30) 


1 
s? = MS js, 8 ie = A (MS sc - MSs); 
1 1 
82. = —(MSgc-MS yes), $2, = —(MSac-MS es); (7.31) 
an bn 


1 
2 =—— MS: —MS yes . 
8, aba * G ) 
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Table 7.7 E(MS) of a three-way cross-classification for a model with finite level populations 


and two mixed models. 


Source of variation 


Model with finite level populations [N(R)—>] 


Levels of A 


Levels of B 


Levels of C 


Interaction A x B 


Interaction A x C 


Interaction B x C 


Interaction A x Bx C 
Residual 


b 
beno® + cn (1 = xm) o2,+bn (1 - nS) o, 


+n(1-75) (1 srry )atae to? 


+n(1- 5) (1- sie) ole 


A, B fixed, C random; model equation (7.30) 


C a ob 


(a-1)(b-1) 44 


2 
(ab); +O? +0" 
i=l j=l 


A fixed, B, C random; model equation (7.32) 


a 
us a +cno*, +bno?_+no?, +0 
a-1 i ab ac abc 
i=l 


acno}, + ano; + o 


2 2 
abno~ + ano;. +0 


2 2 2 
CNO7, + NOjp, +O 


2 2 2 
bno‘, + N61, + 6 


ano; + o 


2 2 
NOt ye + O 


2 


Oo 
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The model equation of the second type (A fixed, B, C random) is 
Viger =H + 4; + Bj + CK + (ab); + (dc), + (bc) x. + (abc) i, + €1(i,),k) 
G=1,..,.Gj=1L..,bk=1,..,G1=1,...,n). 

If we put in the E(MS) of the model with finite level populations N(A) = a and let 

N(B) and N(C) tend to oo, then we obtain the E(MS) for the model equation 


(7.32), in the last column of Table 7.7. From this we obtain the estimators of 
the variance components: 


(7.32) 


2 2 1 
s = MS 5 Sibe = — (MS 48c -MS yes), 
n 
1 
She 7. —(MS3c —MS es), 
F (7.33) 
2 = —(MS4c-MS , 82, = —(MS,p-MS ; 
Sac bn AC ABC)» Sap aa AB ABC) 
1 1 
? = ——(MSc-MS3c), 83 = —(MSg-MSzc). 
Ss, bat Cc BC)»  S% oa B BC) 


7.5.4 Three-Way Nested Classification 


In the three-way nested classification, there are six mixed models. To save space 
here, side conditions are not given. They are analogue to those in Section 7.5.2. 
(The sums over all fixed effects of a factor are zero; covariances between random 
model components are zero.) At first the model for finite level populations is 
discussed, and then the E(MS) are derived by the rules in Section 7.2. Then 
the six balanced mixed models are treated; their E(MS) can be derived by the 
reader from those of the model with finite level populations. They are sum- 
marised in Table 7.8 and 7.9. We further give the estimators. The SS, df and 
MS are given in Table 5.27. The model with finite level populations has the model 
equation 


Vignt = H+ Gi + Diiy + Ckii,j) + E1(4,3,) 


7.34 
(G=1,..,.aj=1,.., Dk =1,..,cG1=1,...,n). wes) 
Following rule 4 in Section 7.2, we obtain the following scheme: 
i j k i 
aj te b c n 
N(A) 
by) 1 fic b Cc nN 
N(B) 
CKti,) 1 u re wea n 
N(C) 
C(t, j, K) 1 1 1 po 
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and from this the E(MS) in Tables 7.8 and 7.9. The six mixed models are 
as follows: 


I) Levels of A randomly selected, levels of B, C fixed 

II) Levels of B randomly selected, levels of A, C fixed 
III) Levels of C randomly selected, levels of A, B fixed 
IV) Levels of B and C randomly selected, levels of A fixed 
V) Levels of A and C randomly selected, levels of B fixed 
VI) Levels of A and B randomly selected, levels of C fixed 


The estimators are 
8” = MSpe« for all six cases and further: 


For case I, 
1 
2 = —_ (MS 4-MS js). 7.35 
85> po (MS-MSs) (7.35) 
For case II, 
1 
Sid = on Sina -MS,5). (7.36) 


For case III, 
1 
Sino = (MS cins—MS jes). (7.37) 
For case IV, 


1 
s2 A (MS cing —MS yes); 


cinb = 
; (7.38) 
Sting Sie (MSzina -MS ins). 
Cn 
For case V, 
1 
Scinb = 1 MScing —MS 5), 
‘ (7.39) 
2 
= MS -MS. in é 
Ss; ra A Cc B) 
For case VI, 
1 
Shing = oy MESsina -MS ys); 
(7.40) 


1 
2 
= — (MS, -MSzina)- 
Sa Ben 4 Bina) 


Table 7.8 E(MS) for a balanced three-way nested classification — models with finite level populations and mixed models with one random facto. 


Source of 
variation 


Between A 


Between Bin A 


Between C in B oe 
and A "Oc ina 


Residual o 


Model with finite level populations (see Example 7.1) 


bcno? + cn ieee Ching tlt ieee. pte 
a N(B) Inada N(C) cin 


Arandom, B, C fixed (I) |B random, A, C fixed (Il) 
bcn 
beno? + o a- 14 at r CNG; j ina be o 
cn 2 


Table 7.9 E(MS) for a balanced three-way nested classification — models with one fixed factor. 


Source of variation 


Between A 


Between B in A 


Between C in B and A 


Residual 


A fixed, B, C random (IV) 


a 
ben 2 
a-1 i=1 


2 2 
G; + CHO, in g tO, in pt © 


2 


B fixed, A, C random (V) 


2 2 2 
benos, + NO~ i, p+ 0 


ee i) ae Ph “(i) y+ N62 in p+ 6 


no. pto 


2 


C random, A, B, fixed (III) 


C fixed, A, B random (VI) 


2 2 2 
benos, + CNOy, 5, g@ +O 
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7.5.5 Three-Way Mixed Classification 


In Chapters 5 and 6, mixed classifications with three factors have been consid- 
ered. Mixed models for the two types of mixed three-way classification are now 
discussed. 


7.5.5.1 The Type (B<A)xC 
For the balanced case of the mixed classification (model equation (5.45) dis- 
cussed in Section 5.4.3.1), first the E(MS) for the model with finite level popula- 
tions are derived. The E(MS) are given in Table 7.10. 
For the six mixed models, the E(MS) can be found in Tables 7.10 and 7.11. 
The estimators for the variance components besides s’ = MS, are given below. 
With rule 4 in Section 7.2, we receive 


i j k i] 
l a 
a; NA) b Cc n 
b 
ae 1 
by) 1 N(B) Cc nN 
Cc 
Ck a b “NE n 
a Cc 
; 1-—— 1-—~ 
(ac) ix N(A) b N(C) n 
b Cc 
ea 1-———~ 
(be) xi) 1 1 NB) N(C) n 
1 nN 
PM j,&) . : : ~ N(res) 


For the six models the estimators of the variance components are the 
following: 
— A fixed, B, C random 


1 
Sati = n (MSz x CinA —MS yes); 


1 
s = bn Sa xc-MSzx cin)» 
(7.41) 


1 
2 = —_(MSc-MS3zxcina); 
Ss. cont C Bx CinA) 


1 
Shee a (MS gina -MSp x cina); 
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Table 7.10 E(MS) of the mixed classification (B < A) x C - model with finite level populations 
(N(R)—00) (see Example 7.1). 


Source of variation E(Ms) 


Between A 


b 
ino +on( 1-7 )oh mat bn(1- es lat 


Between B in A CHO? in g tM (1 - nD) Oho in at O 

Between C abno? + bn (1- ty) +n(1- 55] Ope ing tO 
c N(A) ) %# N(B) ) °% in a 

Interaction A x C bno?.+n (1 - xa) Che ing t& 

Interaction B x C in A 0%. in gt © 

Residual o 


— B fixed, A, C random 


1 
oe (MSz x CinA -MS 5); 


2 Pa 
Sbcina = 


ac 


n 
sx - (ms — MS jes) 
FF bn AxC res)» 


l (7.42) 
2 = —_(MSc-MS,xc); 
S, abn‘ Cc A c) 
<= _* (ms -MS4 xc); 
a ben A AxC)s 
— C fixed, A, B random 
2 1 
She in a = 7 MSs CinA —MS yes); 
2 1 
Sac = bn MS4xc- MSs x CinA)>s 
(7.43) 


1 
2 
i = —(MS8zina - MS; es), 
Sb ina =| BinA ) 


1 
2 = —_(MS,-MSz in a); 
Sa fa a Bin A) 


— A random B, C fixed 
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Table 7.11 E(MS) of the mixed classification (B < A) x C — models with one fixed factor. 


Source of variation 


Between A 


Between B in A 


Between C 


Interaction A x C 


Interaction Bx Cin A 


Residual 


A fixed, B, C random 


ben 


2 2 
+ HO ina t © 


2 2 

NOD in at "be ina t © 
2 2 2 

abno; + N6icing + 


2 2 vy) 
bn6 7. + NOhcing + © 


2 2 
NOrcina + © 


o 


2 2 2 
—- a; + cno; + bno: 
Fat 3 i bina ‘ac 


2 


B fixed, A, C random 


2 Bio 0) 
beno?, + bnoz,, + 6 


cn 2 2 2 
abt) (i) + Mein a + © 
ij 


2 ar) 
abno; + bnoi. +o 


2 472 
bno7,. +0 


2 2 
NObcin a + © 


o 


C fixed, A, B random 


2 2 2 
Denoz, + CNO Fin g + © 


2 2 
CNOG in gt © 


abn 
c-1 


2 2 2 
bNO 7. + NOFeing + © 


2 2 
NOrcina + © 


o 


2 2 2 
y Cy + DN), + NOG. in g + © 


Analysis of Variance - Models with Finite Level Populations and Mixed Models | 371 


1 
s = — (MS, xC -MS es); 


. - (7.44) 
S= ae (MS 4-MS yes); 
— B random A, C fixed 
Spe in a= “(MSs cina ~MS res), 
i (7.45) 
8, = on SB in A~MS jes); 
— Crandom A, B fixed 
Sein a= = ~ (MSpx cin A~MSyes) 
Sie = Dae <c-MS es); (7.46) 


1 
2 = —_(MSc—MS ys). 
Sc abn C ) 


The E(MS) of the six models are given in Table 7.11 and in Table 7.12. 


7.5.5.2 The Type C<AB 

For the type C < AB of the mixed classification the E(MS) for the model with 
finite level populations have been derived in Section 7.2, Example 7.4. The fol- 
lowing mixed models exist: The E(MS) for these four cases can be found in 
Table 7.13. The estimators of the variance components are given below: 


Fall 1: Cfixed, A or B (W.Lo.g. we choose A) fixed, 

Fall II: Cfixed, A and B random, 

Fall III: Crandom, A and B fixed, 

Fall IV: Crandom, A or B (W.Lo.g. we choose A) random. 


Case I: 


S5 = =(MS, xB -MS es); 
(7.47) 


2 
= — (MS5-MS yes); 
%% acn = . ) 


Table 7.12 E(MS) of the mixed classification (B < A) x C — models with one random factor. 


Source of variation 


A random, B, C fixed 


B random, A, C fixed 


C random, A, B fixed 


Between A 


Between B in A 


Between C 


Interaction A x C 


Interaction Bx Cin A 


Residual 


beno? + 0? 


bno?. + 0” 


aT (bean + 0° 


o 


abn 2 


2 2 
Cy + HOG in gt © 


c-14 


bi 


n 2 
eye +10 h6 ina tO 
i, 


(2-1) 


2 2 
Nrcin a + © 


Table 7.13 E(MS) for mixed models of the mixed three-way classification of type C < AB. 


Source of variation B random, A, C fixed A, B random, C fixed C random, A, B fixed A, C random, B fixed 
Between A ben bene? + cno>,, + 0” ben beno? + no? +07 
—S°a+cno?, +0" 2 ab —— Soap +n? +0" 4 ¢ in ab 
i ‘ab i c in ab 
a-lFS a-1 
Di 2 2 2 2 acn acn 
Between B acno;, + 6 acno;, + no), + 0 B+ no? in gp tO B? +10? in gy + NO, +0 
-_ jj cin a iD j cin a a 
J J 
i n n 2 2 2 2 
Between C in A x B Wei » > hai +e Wet 5 ; Fie +e NO* in ab + © NO? in ab + © 
ab(c-1) ab(c-1) 
‘ 2 2 v) 2 2 2 2 2 
Interaction A x B CNOA, +O CNOny + O j ny; (ab); 7 NO’ in ah + COG, + O 
nor, 
¢ in ab * (q—4)(b-1) 
2 2 2 


Residual o o o o 
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Case II: 
1 
Cae = me (MS, xB —MS yes); 
1 
s5 = —(MSp-MSa x3), (7.48) 
acn 


Case III: 


1 
s; in ab — 7 (PQuscina «CMS yes); (7.49) 


Case IV: 


1 
Sein ab = 4 MSc inaxc- MS res); 


1 


1 MSA xB-MScina xB), (7.50) 


2 
Sab = 


a 


S= 


1 
a ben S84 MS cin a xc). 


7.6 Exercises 
7.1 Use the data set D in Exercise 6.3. The six boars must randomly be split 
into two groups. The groups are now understood as two locations as levels 


of a fixed factor. What model do we now have? Call the generated file D1. 


7.2 For file D1 test the null hypothesis that there are no differences between 
the locations. 


7.3 Estimate from D1 the variance component of the factor ‘boar’. 


References 


Hartley, H. O. and Rao, J. N. K. (1967) Maximum likelihood estimation for the mixed 
analysis of variance model. Biometrika, 54, 92-108. 

Henderson, C. R. (1953) Estimation of variance and covariance components. 
Biometrics, 9, 226-252. 


Analysis of Variance - Models with Finite Level Populations and Mixed Models 


Searle, S. R. (1971, 2012) Linear Models, John Wiley & Sons, Inc., New York. 
Seifert, B. (1980) Priifung linearer Hypothesen iiber die festen Effekte in balancierten 
gemischten Modellen der Varianzanalyse. Diss. Sektion Mathematik, Humboldt 

Universitat Berlin. 
Seifert, B. (1981) Explicit formulae of exact tests in mixed balanced ANOVA-models. 
Biom. J., 23, 535-550. 


375 


8 


Regression Analysis —- Linear Models with Non-random 
Regressors (Model | of Regression Analysis) and with 
Random Regressors (Model Il of Regression Analysis) 


8.1. Introduction 


In this chapter we describe relations between two or more magnitudes with sta- 
tistical methods. 

Dependencies between magnitutes can be found in several laws of nature. 
There is a dependency of the height / of a physical body falling under the influ- 
ence of gravity (in a vacuum) and the case time ¢ in the form h = at’, and the 
relationship provided by this formula is a special function, a so-called functional 
relationship. Similar equations can be given for the relationship between pres- 
sure and temperature or between brightness and distance from a light source. 
The relationship is strict, that is, for each value of t, there is a unique /-value, or 
in other words, with appropriate accuracy from the same ¢-value, there always 
results a unique /-value. One could calculate a by the formula above by setting 
t and measuring h, if there is no measurement error. The -values for various 
t-values lie on a curve (parabola) when f is plot on the abscissa and /: on the 
ordinate of a coordinate system. In this example, you could give h as well 
and measure the time. In functional relationships, therefore, it doesn’t matter 
which variable is given and which is measured, if no other aspects (accuracy, 
effort in the measurement), which have nothing to do with the context itself, 
lead to the preference of one of these variables. 

There are events in nature and variables, between which there is no functional 
relationship but they are well dependent on each other. For instance, let’s con- 
sider height at withers and age or height at withers and chest girth of cattle. 
Although there is obviously no formula by which you can calculate the chest 
girth or the age of cattle from the height at withers, nevertheless there is obvi- 
ously a connection between both. You can see this in some animals when both 
measurements are present and a point represents the value pair of each animal 
in a coordinate system. All these points are not, as in the case of a functional 
dependency, on a curve; it is rather a point cloud or as we say a scatter diagram. 
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In such a cloud, a clear trend is frequently recognizable, which suggests the 
existence of a relationship. Such relationships, which are not strictly functional, 
are called stochastic, and their investigation is the main subject of the regression 
analysis. 

Even if a functional relationship between two features exists, it may happen 
that the graphic representation of the measured value pairs is a point cloud; this 
is the case if the characteristic values cannot be observed without greater meas- 
urement errors. 

The cloud itself is only a clue to the nature of the relationship between two 
variables and suggests their existence. It is required, however, to discern the 
relationship precisely through a formula-based representation. In all cases, 
the estimation target is a function of the independent variable called the 
regression function. In regression analysis, it is also of interest to characterize 
the variation of the dependent variable around the regression function, which 
can be described by a probability distribution. One must distinguish two 
important special cases that should be characterized for the case of two vari- 
ables x, y — the generalization to more than two variables is left to the reader. 
In the first case, x is a non-random variable. Most commonly, regression 
analysis estimates the conditional expectation E(y|x,) = f(x, of the regressand 
variable given the value of the regressor variable — that is, the average value of 
the dependent variable when the independent variable is fixed. The relation- 
ship is modelled by 


¥4 = (mi) =f (mi) + ei (8.1) 


or 


y=f(x) +e. 
e; are random variables with E(e,) = 0, var(e;) = o* and cov(e;, e;) = 0 for i F$ j. 
Often the distribution of e; is assumed to be normal N(0, o”). 

This we call a model I of regression analysis. As an example, you could call the 
relationship between the height of withers and the age of the cattle. Of course, 
you can also write the functional relationship in which only the measured values 
of a variable (as y) are strongly influenced by measurement errors, in the form of 
Equation (8.1) and treat analysis with the model I regression. The functional 
relationship is between and through y = g(x). 

In this chapter all occurring functions are assumed to be differentiable for all 
their arguments. 

In the second case both » and y are random variables distributed by a two- 
dimensional distribution with density function g(x,y), marginal expectation 
Hx» My Marginal variances o?, o; and covariance o,,. Regression of x on y or 
of y on x means the conditional expectations E(x|y) and E(y|x), respectively. 
If g(x, y) is the density function of a two-dimensional normal distribution, then 
the conditional expectations E(«|y) and E(y|x), respectively, are linear functions 
of y and x, respectively, 
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E(xly) =a + By, E(y|x) =a" + Bx. 


The random variables x or y deviate by e or e”, respectively, from E(x|y) or E(y| 
x); therefore the stochastic dependency between « and y is either of the form 


x=E(x|y)+e=at+ fpyte (8.2) 
or of the form 
y=E(y|x) +e* =a* + p'x+e*. (8.3) 


The Equations (8.2) and (8.3) are not transferable into each other, which 
means that 


x-a-e x ae 

ne ee ee 
differ from each other. This is easy to see, if we look at the meaning 
of a,f,a* and p*. 

An equation of the form x = E(«|y) + e or y = E(y|x) + e* is called a model II of 
regression analysis if the conditioning variables are written as random. An exam- 
ple is the relationship between wither height and chest girth mentioned above. 

The difference between both models becomes clear by looking at the exam- 
ples above. In the dependency wither height—age, the age can be understood as a 
non-random variable (chosen in advance by the investigator). The wither height 
is considered dependent on age and not the age dependent on the wither height. 
For the dependency wither height—chest girth we model by two random vari- 
ables. Therefore two equation analogues (8.2) and (8.3) are possible. 

The function y = f(x) in (8.1) and the functions E(«|y) and E(y|x) are called 
regression functions. The argument variable is called the regressor or the influ- 
encing variable (in program packages often the misleading expression ‘inde- 
pendent variable’ is used, but in model II both variables are dependent on 
each other). The variables y in (8.1), x in (8.2) and y in (8.3) are called regressand 
(or dependent variable). In this chapter we assume that regression functions are 
a special case of the theory of linear models in Chapter 4. 


and y=a"+p'x+e 


Definition 8.1 Let X be a [m x (k+1)] matrix of rank k+1<n and Q=R{[X] 
the rank space of X. Further, let 6 € Q be a vector of parameters f; (j= 0, ..., 
k) and Y= Y,, an n-dimensional random variable. If the relations E(e) = 0, and 
var(e) = o°J,, for the error term e are valid, then 


Y =XB+e (YER", BEQ=RX)) (8.4) 


with X = (1,, X*) is called model I of the linear regression with k regressors in 
standard form. 

Equation (8.4) is, as shown in Example 4.3, a special case of Equation (4.1). As 
shown in Chapter 4, var(e) = Vo" with positive definite matrix V can be reduced 
to (8.4). At first we consider (8.4) and later use also var(e) = Vo. 
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8.2 Parameter Estimation 


8.2.1 Least Squares Method 


For model I of regression we can prove the following. 


Theorem 8.1 The BLUE of f and an unbiased estimator of o” are given by 


b=B=(X™X) 'XTY (8.5) 
and 
1 
s’= [Pp ¥aK(XA Ree IPP 
n—k-1 
‘ (8.6) 


= VAX XIX VY: 
n—-k-1 ( ( ) ) 


The proof follows from Example 4.3. 


Theorem 8.2 If Yin (8.4) is N(Xf, o”I,,)-distributed, then the MLE of f is given 
by (8.5) and the MLE of o” is given by 


1 7 2 
a? == |[¥7-x(x7x) xT ||. (8.7) 

n 
b in (8.5) and s* in (8.6) are sufficient with respect to # and o”. b is 


-k-1 

NI, 0°(X7X)~"]-distributed and Z : s’ is independent of b CS(n-k- 1)- 
oOo 

distributed. 


Proof: b in (8.5) and 6” in (8.7) are MLE of f, and o” follows from Example 4.3 
together with (4.13) and (4.14). With w=Xf, Y=07I, and A =(X7 X)1X", it 
follows that b with 

E(b) = Aw =(X'X) "XT XB=p 
and 

var(b)=A ZA =(X7X)"X707I,X(X7 X)* =0°(X7 X)* 
is (k + 1)-dimensional normally distributed. 
To show that (ete is CS(n — k- 1)-distributed, we have to show that 
L, -X(X7 X) 1X? =K . is idempotent of rank n-k-1 and 
A= b TX™KXf=0. The idempotence of K is obvious. Because with X also 


X™X and X(X7X)'X" are of rank k+1, due to the idempotence of 
X(X? X)"1 X7 = B, there exists an orthonormal matrix T, so that T’7BT is a diag- 
onal matrix with k + 1 values 1 and nm — k-1 zeros. Therefore rk(K) =n-—k-1. 
Finally 
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oA =f? X™(I,—X(X7 X)* X7)Xp=0. 


By this b and Y'KY are independent because POO XR KSO. lik 

We only have to show the sufficiency of b and s”. This is done, if the likelihood 
function L(Y, f, 0”) can be written in the form (1.3). From our assumption it fol- 
lows that 


1 
L(Y,B,0°) = sail¥ xo\"}. 


1 
way? | 
From (8.6) the identity 

||¥ -XBl|? = ||¥ -Xb|)? + (b-B)" (XTX) (b-f) 
= (n—k-1)s* +f (B,b), 


follows for a certain f(Z, b) such that 


 [(n-k-1)s? +f (6b) | 


L(Y,B,0°) 7 202 


1 
(o Yin) exp 


has the form (1.3) and the theorem is proven. 


Example 8.1 If the number of regressors in (8.4) is k= 1, we have a linear 
regression with one regressor or a so-called simple linear regression. With 
k=1 we get 


| nes eee 
xr ( and f7 = (B,py) 


X1 X02 «06 Ny 
and (8.4) becomes 


¥,=Pot+hixit+e (i=1,..,7). (8.8) 


rk(X) = k+ 1=2 means that at least two of the x; must be different. We look for 
the estimators of the coefficients 9 and /;. By the least squares method, an 
empirical regression line jj = bo + b1x in the (x, y)-coordinate system as an esti- 
mate of the ‘true’ regression line y = Bo + 1x has to be found in such a way that if 
it is put into the scatter diagram (x, y;) that S=S~" ,(y;-By-B, xi)” is 
minimised. 

The values of fo and 1, minimising S are denoted by bo and b,. We receive 
the following equations by putting the partial derivations of S with respect to 
Po and ff; equal to zero and replacing all y by the random variables y 
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n Dini Dini 
bie Di Di a _ Ai ~ Diet Dini (8.9) 
A n 2 ~ n n 2 : 
ay ye (Lin) nai ~ (Sofi) » 
i=1”i n 
ee aa ee een ey ee (8.10) 
n n 


Because S is convex, we really get a minimum. 

We call estimators obtained in this way least squares estimators or in short LS 
estimators. As in the analysis of variance (ANOVA), we use for the sums of 
squared deviations for short: 


2 (Sox)? 
SSy= Sox aaa 


and 


SS, = Sy - 2) 


And analogously for the sum of products 


xy y Z = 
SPxy = Soxy- 2D) = S°(x-%)(y-5), 
the symbol SP,,, (SP... = SS) is used. Then b can be written as 


_ SPyy 


b, = 
Pe SSS 


Equations (8.9) and (8.10) are special cases of (8.5). We get (all summation 
from i=1 to i=n) 


nN Xi fi 
X'X= 2 , XTY= dy 
yo; ae x; 

1 xt - oxi 
IXTX| \-Sox;, on 
nx? -(S>x;)*, and this leads to (8.9) and (8.10). 

An estimator s” of o” is by (8.6) equal to 


Now (X7X) "= 


with the determinant |X TX | = 


ii SP, 
ex Soi 1(9;- Bo - bmi)” 7 SS, — Se (8.11) 
n-2 n-2 — ‘ 
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(8.11) follows from er because 


Y? (Iy-X (XTX) 'XT)¥ =) yP-¥7X (XTX) 'XTY, 
if we insert X’ Y=(Y7X)" and (X’ X)' as given above. 
Putting the variables x and y in a coordinate system, the values of the variable 


x at the abscissa and the realisations y; of the random variables y (or their expec- 
tations 7,) at the ordinate, we obtain by 


9, = Do + by x; (i= 1,...,7) (8.12) 


a straight line with slope b, and intercept bo. This straight line is called an (esti- 
mated) regression line. It is connecting the estimated expected values jj, for the 
values of x;. If we put the observed values (x;, y;) as points in the coordinate sys- 
tem, we receive a scatter diagram. Amongst all lines, which could be put into 
this scatter diagram, the regression line is that one for which the sum of the 
squares of all distances parallel to the ordinate between the points and the 
straight is minimum. The value, respectively, of b; and /, shows us by how 
many units y is changing in mean if x is increasing by one unit. The distribution 
of by and JD, is given by the corollary of Theorem 8.1. 


Corollary 8.1 The estimators bg given by (8.10) and b, given by (8.9) are 
under model equation (8.8) and its side conditions with expectations 


E(bo) = Bo, Ebi) =P, (8.13) 


the variances 


oS x? 2 
o2 = var(bo) = 2% =) o%= var(b)) = —"—, (8.14) 
ny, (xj-%) > (4%-) 
and the covariance 
2 : 
cov (bo,b)) = - Ji. (8.15) 
2 
ny) (xj-%) 


distributed, with Y are also by) and b, normally distributed. 


Of course we can use other loss functions than the quadratic one. At the place 
of the sum of squared deviations, we could, for instance, use the sum of the p-th 
powers of the module of the deviations 


S*= 3° |yi-fo-Byxil’ 
i=1 


and minimise them (L,-norm). Historically this happened before using the LS 
estimators already by BoSkovié, an astronomer in Ragusa (Italy). Between 1750 
and 1753 for astronomical calculations, he used a method for fitting functions 
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by minimising the absolute sum of residuals, that is, a L;-norm. Carl Friedrich 
Gauss made notice of Boskoviés work on ‘orbital determination of luminaries’ 
(see Eisenhart, 1961). A modern description of the parameter estimation using 
the L, loss function, iteration methods and asymptotic properties of the esti- 
mates can be found in Bloomfield and Steiger (1983) and for other p-values 
in Gonin and Money (1989). 

If k regressors x1, ... , x, are given with the values «;,... ,x (i=1, ...,7), 
we call (8.4) for k > 1 the model equation of the multiple linear regression. Then 
the components b; =f; of B in (8.5) are the estimators with realisations 
minimising 


n k 2 
S= De (: -Bo- Sp) . 
i= j= 


The right-hand side of the equation 
¥; = E(y;) = bo + by xj + aa + by xix (8.16) 
is the estimator of the expectation of the y;. An equation in the realisations 
yi = E(y;) =bo + byxiyt ... + Dexix 
defines a hyperplane, called the estimated regression plane. The b; and b; are 
called regression coefficients, respectively. For the estimator of 0”, we write 
n a \2 
2 dizi Vi-Hi) 
se SS 8.17 
n-k-1 ( ) 
By (8.4) we also can describe non-linear dependencies between Y and one 
regressor x=, (but also more regressors) if this non-linearity is of a spe- 
cial form. 
We restrict ourselves to one regressor; the transmission to more regressors is 
simple and is left to the reader. In generalisation of (8.8), let 


y, =f (xi) +e. (8.18) 

In (8.8) we had f(x) = Bo + Bix. 
Definition 8.2 Let k+1 linear independent functions g(x) (x€BC R’; 
&o(x) = 1) be given and amongst them at least one is non-linear in x. If the 


non-linear regression function with k + 1 (unknown) parameters a; can be writ- 
ten as 


k 
f(x) =f (40-40%) = 'Saigi(e), (8.19) 


and the (known) functions g;(x) are independent of the parameters q;, we call the 
in q; linear function f(x) a quasilinear regression function. 
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The non-linearity of a quasilinear regression function refers this to the regres- 
sor only and not to the parameters. 

Regression analysis with quasilinear regression functions can be easily led 
back to the multiple linear regression analysis. Setting in (8.19) 


Bi(x) = Xir (8.20) 
we receive 
k 
SF (%,00)-+5 Ak) = yam with xo =1, 
i=0 


so that (8.18) becomes 


k 
Yi= So ax + ej (i= 1,...5 13 Xoj = 1). 
j=0 


In addition, this model equation is apart from symbolism identical with (8.4). 
By this quasilinear regression function can be handled as a multiple linear 
regression function. 

Nevertheless a practical important special case will be considered in some 
detail, because we can find simplifications in computation. This special case 
is the polynomial regression function. 


Definition 8.3 _ If the g,(x) in (8.19) are polynomials of degree i in x, that is, if 
Six, ao, ...,@%) can be written as 


k k 


Ff (%, 05.50%) = S- a4P;(x) = S > Bix! = P(x, Bos. Bes (8.21) 
j=0 j=0 
then we call f(x, ao, ... , ax) and P(x, Bo, ... ,P,) polynomial regression functions, 


respectively. 
With Definition 8.2 we can write the model equation of the polynomial 
regression as follows: 


k 
y= > Bx +e (i=1,..0). (8.22) 
j-0 
If the 1 values x; of the regressor x are prespecified equidistantly, that is, 
we have 


xj =a+th (i=1,...,n;h=const), (8.23) 


computation becomes easy in (8.21), if we replace the P,(x) and use the orthog- 
onal polynomials in i-i, because the values of these polynomials are tabulated 
(Fisher and Yates, 1974). 
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We demonstrate this at first for a regression function quadratic in x; afterwards 
the procedure is explained for regression functions of arbitrary degree in x. 


Example 8.2 Orthogonal polynomials for a polynomial regression of 
second degree 
If the regression function in (8.21) is a polynomial of the second degree in x, we 
get from (8.22) 

¥; = By + Bixi + Box +e; (i=1,...,7), (8.24) 


and this shall be written in the form 


y; = + a P; (i-i) + AgP2(i-i) + E;. (8.25) 


P, and P, shall be orthogonal polynomials, that is, the relations 
S “Pi (i-i)P2(i-i) =0 
i=l 
and )>y_,P;(i-i) =0, j=1,2 shall be valid. 


> luaw ., n(nt+l1) n+l 
Because i= —) 0, oe , we have 


and 


es 1 1 1" 
Pai) =Pa( i=") = do veh (i-"52) (1-3 ) (dy £0). 


The values co,¢1,do,d, and dp» have to be chosen in such a way that the 
conditions 


SPP =0 
i=1 


and 


Sey PE0 
i=l al 


are fulfilled using short notation. 
From (8.23) and (8.24), we get 


y; = Bo + By (a+ ih) + Bo(a + ih)’ +e. (8.28) 


Regression Analysis - Linear Models with Non-random Regressors 


Further it follows from (8.25) to (8.27): 


1 ey 
¥, =A + Ay lore (-%5 )| + 9 ova (1-"St) +a (-"3*) | + &;. 


(8.29) 


Extending (8.28) and (8.29) and arranging the result according to powers of i, 
by comparing the coefficients of these powers, leads to 


n+1 n+1 n+1) 
5) ) +e (0-03 dy eas ), 


hp, + 2ahPp, =Q@1C, + Q2 (dy = (n + 1)d2),h?B, = doa. 


Bo + ap, + a’ By =A + ay (« —Cy 


(8.30) 


Estimating the a; by the least squares method and replacing the parameters in 
(8.30) by their estimates (Theorem 4.3) model equation (8.24) can be replaced 
by (8.25). Equation (8.30) simplifies when choosing co, Cc; do , d; , dz, as men- 
tioned above. Due to 

SOP, = ¥> Po= S> PP. =0,c, 40, do £0, we obtain 


SoPi=Socotad, (i- od = NCo; 


that is, it follows co = 0. Further it is 


1) -1 1 
S (Po =ndo + dy (s j2 eo) =ndy + dy), 


n(n+1)(2n+1) 


because the sum of the squares of 1 to 1 equals . Because of 


-1 1 
dy #0 from )*P, = 0 follows do = og eee), 


From 5 >P,P = 0 follows d, =0. 
Hence, orthogonal polynomials P; and P, have the form 


P (:-"=*) =i (:-"*) (8.31) 
P, ("5") = dy ( mt) eer t (8.32) 
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We should choose c, and dso that each polynomial has integer coefficients. 
Fisher and Yates (1974) give tables of P,P, >P? and )~P5. 

We now consider the general case of the quasilinear polynomial regression 
and assume w.l.o.g. that © 21) is already written as 


P(x, Bore )= SA 


If we put xj =x, we “ee shown that 


k 
i=Bot Bx, +e (i= 1.47) (8.33) 
j=l 
is of the form (8.4). It only must be shown that 
ee ee career 
oe eee 
X= 
Ty eek 


n 


is of rank k + 1. This certainly is the case, if at least k + 1 of the x; are different 
from each other and k+ 1 <x. Let these conditions be fulfilled (assumption of 
the polynomial regression). Now Theorem 8.1 can also be applied on the quasi- 
linear regression. For XX and X’Y, we obtain 


n Se ee oes ee Soy; 
Soe | ead oe xy; 
X'’X= > ye ae oe ya ‘i and X7Y= yxy; 


k 
Dost att) Dah? Doar ii 
For equidistant x; as in Example 8.2 for k = 2, the use of orthogonal polyno- 
mials has numerical advantages. 
With modern computer programs, of course, the work does not need such 
transformations. Nevertheless, we consider this special case. 


Theorem 8.3 If in model (8.33), the x; are equidistant, that is, it can be written 
1 
in the form (8.23), and if P, ("=") are polynomials of degree j in 


oe 


(j =0,...,k;i=L,...,2) so that 
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: 6 ee _ ntl _ ntl 
Sort= oan (i- 5 ), Po(i- 5 }= (8.34) 


then an LS estimator of the vector a = (ao, ..., a)" is given by 
+1 +1 
sx (29) sya (tt) 
x 7 2 2: 
a=|y, nei” ae (8.35) 
em tn) (2 
2 2 
The LS estimator of f = (Bo, ..., Bx)" is 
b=U"'Wa (8.36) 


with U and W obtained from (8.34) (by comparison of coefficients) so that 
Up = Wa. (8.37) 


Proof: With 


the right-hand side of (8.34) with Py = 1 presentable as Xa and by this (8.33) is 
presentable as Y = Xa + e, and this is an equation of the form (8.4). Further (8.35) 
is a special case of (8.5), because 


> SSP eg ate J oPa 
SPy VPA... SPuPu oct 
eeefH PPM (aay (st) 


2 


SP SSPePu SDP? 


and 
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a 0 
n 
1 
( xt Xx) =f De ii 
0 1 
Pi 
is a diagonal matrix, and 
vi 
Dy 
XTY= 
IP Ki 


Equation (8.36) follows from the Gauss—Markov theorem (Theorem 4.3). 
X™ X becomes by suitable choice of coefficients in 


_ nti _ ntl _ nti 2 ; n+1\/ 
B,( 5-752) hay hay (¢- 75) ky (1-75) oti i252) 


a diagonal matrix. The values of the polynomials are uniquely fixed by i,j and 
n, and they are tabulated. 


Example 8.3 In a carotene storage experiment, it should be investigated 
whether the change of the carotene content in grass depends upon the method 
of storage. For this the grass was stored in a sack and in a glass. During the time 
of storage in days, samples were taken and their carotene content was ascer- 
tained. Table 8.1 shows the results for both kinds of storage. 

The relationship between carotene content and time of storage is modelled by 
Equation (8.8); the side conditions and the additional assumptions may be ful- 
filled. This relationship must be modelled by model I, because the time of stor- 
age is not a random variable; its values are chosen by the experimenter. 


Hints for SPSS 

Contrary to the ANOVA in SPSS, no distinction is made between models with 
fixed and random factors. Therefore the correlation coefficient defined in 
Section 8.5 is always calculated, also for model I where it is fully meaningless 
and must be dropped. 


Regression Analysis — Linear Models with Non-random Regressors | 391 


Table 8.1 Carotene content (in mg/100 g dry matter) y of grass in dependency of the time of 
storage x (in days) for two kinds of storage as SPSS input (columns 4 until 7 are 
explained later). 


Time Sack Glass Pre_1 Res_1 Pre_2 Res_2 
1 31.2500 31.2500 31.16110 31.16110 31.16110 0.08890 
60 28.7100 30.4700 27.94238 27.94238 27.94238 0.76762 
124 23.6700 20.3400 24.45089 24.45089 24.45089 -0.78089 
223 18.1300 11.8400 19.04999 19.04999 19.04999 -0.91999 


303 15.5300 9.4500 14.68563 14.68563 14.68563 0.84437 


We choose in SPSS (after data input): 


Analyze 
Regression 
Linear 


and consider both storage in a sack and ina glass. The data matrix of Table 8.1 esti- 
mates of fio, BuvE(sy), var (Bi), var (Bi) and cov (Bou) for i=1 should 
be given. The case i = 2 is left to the reader. 

By (8.9) and (8.10) the estimates b,o and b, as well bo and ba can be calcu- 
lated by SPSS. The estimates 7,; of E (7,;) and y; of E (9) are given in 
Table 8.1 as PRE_1 and PRE_2, respectively. RES_1 and RES_2 are the differ- 
ences ¥,;—y1j and J»; —Yxj, respectively. 

To obtain these values we must in SPSS (Figure 8.1) go to the button ‘Save’ 
and activate there ‘Predicted values’ and ‘Residuals’. Do this for each kind of 
storage. The results appear in the data matrix. To get the covariance matrix 
under statistics, we choose ‘Covariance matrix’. 

The regression coefficients, standard deviations of the estimates and the 
covariance cov (Bj9, B1,)between both coefficients are shown in Figure 8.2. 

The ANOVA table, in SPSS also named ANOVA table, is explained in the 
next section. We obtain 


by, = —0.055, 
bio = 31.216. 
Further we find in the column right of that of the coefficients in Figure 8.2 (Std 


error) the coefficients in oj = / var(bi9) =0.7060 and 64 = \/ var(b11) = 
0.0046. 
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HA Linear Regression x 


Dependent: 


| ee —— 


# glass ¢ Block 1 of 4A 


Block 1 of 1 


Method: [Enter > | 


cc Selection Variable: Gus) 


(9) Case Labels: 
WLS Weight: 


Figure 8.1 Introduction to regression analysis in SPSS. Source: Reproduced with permission 
of IBM. 


tt “Ourput? [Documents]. BM SPSS Statistics Viewer - 6 x 
ie Eot view Date Transform insert Farmat_ Anale Graphs tines Etensiens  Wincow Help 
a 


Summ of 
Mogel Squares at _ Moan Sauare F sig 


1 Regression 176818 1 116816 191.787 001" 
“Resisual 2766 322 
Tota 179506 
'3 Dependent Varatle: sack 
. Predictors: (Constant, tme 


Coefficients” 
seansarsoae 
Unetanaarstes Coottelents — Coemcients 
Model 6 Stt Evor Bete t Sig 
1 (Constant) n216 706 423 900 
ame =085 oot 2992-13648 
& Dependent Vanabie: sack 


Coefficient Correlations” 
Usael ‘ime 
+ Correlations time +000 
Govariances time 1552-5 
3. Dependent Vanable: sack 


Residuals Statistics” 


[IBM SPSS Statistics Processor is ready |__| Unicoda ON 


Figure 8.2 SPSS output for storage in sack of Example 8.6. Source: Reproduced with 
permission of IBM. 
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2.7666 
Because in the ANOVA table in the row residual s” = =a. 0.922 is found, 


the estimates of o7), of, and o\}) can easily be calculated by multiplying the 
results above by s in place of o. 
For i = 2 we obtain analogously 


bo, = —0.081, ba9 = 32.185. 
The equations of the estimated regression lines are 
i=1:9,;=31.216-0.055x;, 


eae (1<x;<303). 
j= 2:9pj = 32.185 -0.081x, 


For the estimated regression function, we should always give the region of the 
values of the regressor because any extrapolation of the regression curve outside 
this region is dangerous. Both estimated regression lines are shown in Figure 8.3. 

If we are not sure, whether the regression is linear or not, we use another 
branch of SPSS, namely, 


Analyze 
Regression 
Curve Estimation 


to calculate polynomials or 


Analyze 
Regression 
Nonlinear 


for intrinsically non-linear regression as described in Chapter 9. 


y 
30 PF 
207 
Vy 
10+ - 
Yo 
0 50 100 150 200 250 300 x 


Figure 8.3 Estimated regression lines of Example 8.3. 
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a Curve Estimation x 


Dependent(s): 
a (sae. 


a 
€ Unstandardized Pre 
& Unstandardized Re 


o Unstandardized Pre. Independent 
o Unstandardized Re. © Variable: 
€ Unstandardized Pre 


O Time 


Case Labels: ¥ Include constant in equation 


r Models 
¥ Linear ¥ Quadratic || Compound || Growth 
] Logarithmic M Cubic [Js ©] Exponential 


[} Inverse 


‘| Power: | Logistic 


_| Display ANOVA table 


Lox _} Baste | Reset, [cancel |_Heip ) 


Figure 8.4 SPSS—window for curvilinear regression. Source: Reproduced with permission 
of IBM. 


We demonstrate the first way for linear, quadratic and cubic polynomials for 
the trait ‘sack’. At first we come to the sheet in Figure 8.4. 

The graphical output is shown in Figure 8.5. 

Here we can see how dangerous an extrapolation outside the interval [1, 303] 
can be. The cubic regression function goes up after 303 days; this certainly is 
impossible. Between the three lines within the interval [1, 303] no large differ- 
ences can be found, the coefficients of the quadratic and cubic terms are not 
significant; the numerical output may be done by the reader. 


8.2.2. Optimal Experimental Design 


In this section, the optimal choice of X in model equation (8.4) for estimating 
P is described. We assume that the size n of the experiment is already given and 
P or XP must be estimated by its LS estimator. Rasch and Herrendérfer (1982) 
discuss the problem that X, 1 and the estimator of / have to be chosen 
simultaneously. 

Let X = (x1, ...,%,,)/ and B the domain of the R**!, in which the row vectors 
x; of X are located. B is called experimental region. {Z,,} is that set of the X, for 
which x; € B. We now call X a design matrix. Contrary to discrete and contin- 
uous experimental designs, introduced now, we understand by X the design 
matrix of a concrete (existing) designs. In the sequel we call X briefly a design. 
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Sack 
35.0000 + 
o Observed 
Linear 
: —-—- Quadratic 
30.0000 4 ~— ~ Cubic 
25.0000 + 
20.0000 + 
15.0000 4 
0 100 200 300 400 


Time 


Figure 8.5 The regression curves for the linear, quadratic and cubic regression. 
Source: Reproduced with permission of IBM. 


In the theory of optimal experimental designs as in Kiefer (1959), Fedorov 
(1971) and Melas (2008), the following definitions are important. 


Definition 8.4 Each set of pairs 


X, X2 +++ Xm 
om = (8.38) 

Pi P2 jah Pm 
with x; € B, 0<p;<1 (i=1,...,.m), 4; #.%; for ij (i,j=1,....m) and 0", pi=1 
is called a discrete m-point design, p; are called weights and x1, ... ,%,, is called 


support of &,,. 


Definition 8.5 Each probability measure € on the measurable space (B, B) is 
called a continuous design. 


Y, as a discrete design, is a special case of a continuous design for a discrete 


probability measure. A concrete design has the form of a discrete design with 
kj : 
Pi= Ki =n and k; integer. 

The problem is to construct a discrete or continuous design in such a way that 
the covariance matrix of # meets some optimality criteria. The optimality cri- 


teria in this section concern a functional ® mapping (X’X)‘ in the R'. We 
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define the optimality for concrete designs; the definitions for discrete and con- 
tinuous designs are left to the reader. 


Definition 8.6 A concrete design X* is called ®-optimal for a regression 
model Y= Xf +e with E(e) =0,,, var(e) = 07, for fixed n and B, if 
-1 “1 
in, &| (XTX) "| =| (x*7x*)']. 8.39 
punt © (COX) (x°°X") (8.39) 


Especially a ®-optimal design with M = (X7 X)" is called for 


-@(M) = |M| D-optimal, 
-@(M)=tr(M)  A-optimal, 
-@(M) = Max! Ma G-optimal, 
XE. 
-@(M) =Amax(M) E-optimal with Amax as maximal eigenvalue of M, 
-@(M) =c™Mc C-optimal with c= (c1,..u¢p)'» p=k +1. 


The C-optimality is of importance if the variance of a linear contrast c’f of the 
parameter vector must be minimised. If we wish to make an extrapolation, the 
results of an experiment from the experimental region B in a region B* (predic- 
tion), we replace in the G-optimality B by B*. 

From a theorem of Kiefer (1959), we know that discrete or continuous (but 
not always concrete!) designs are exactly D-optimal, if they are G-optimal. From 
the same theorem it follows that for special B (e.g. in R”) the support of a discrete 
D- (G-) optimal design contains only points where var(¥) is maximal, that is, 
for which 

maxx? (XX) ys Sie 
xe B 

We restrict ourselves to the G- or D-optimality for the simple linear regres- 
sion. Jung (1973) gives a systematic investigation of the construction of concrete 
optimal designs. Some of his results for a special case of model equation (8.4) are 
given in the sequel. Concerning proofs see his paper. 

At first we consider the case of Examples 8.1 for k=1 (p= 2), with 


n n 
i Da 
1 i=l i=l 


~ [XTX] n 
a Xi n 
i=1 


(x?x)~ (8.40) 


and the experimental region B = (a, b). Then the design with m = 2, the support 
{a, b} and the weights p, = p2 = 1/2 is a discrete D-optimal design. For integer 
this is of course also a concrete D- (and G-) optimal design, where half of the 
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y-values lie at the boundary of the interval. This fact is a special case of the 
following theorems. 


Theorem 8.4 A concrete design with the matrix X= (x, ... ,x,)) with 
x? =(1,x;), B= {x;|x; € [a, b]} and n=2 is then and only then G-optimal, if 
a) For integer n, 5 of the x; are equal to a and J, respectively. 

-] -1 
b) For odd x, > x; values are equal to a and > x; values are equal to b and 


a+b 
one x; equals oa 


It can be shown that for odd 1 concrete D- and G-optimal designs are not 
identical. 


Theorem 8.5 Under the assumptions of Theorem 8.4, X is then and only then 
D-optimal, if 


n 
a) For integer n, 5 of the x; are equal to a and J, respectively. 


n-1 
b) For odd n, ar of the x; are a and b, respectively, and the remaining ~; is 


either a or b. 


For the case n = 5 is for a = -1 and b = 1, 

1 11141 

Xi = 
-1 -1011 

a G-optimal design and 

1 11141 

Xp = 
-l -l11141 


a D-optimal design in (-1,1). We have |X¢X¢| = 20 and |X} Xp| = 24. 


8.3. Testing Hypotheses 


The parameter vector f = (fo, ... Be” lies in a (k + 1)-dimensional vector space 
QAMfq<k+lofthef; (j=0, ...,k) equal zero (or some other fixed number) has 
the consequence that f lies in a (k+1-gq)-dimensional subspace w of Q. In 
Theorem 4.7 it was shown that the components of / always can be renumbered 
in such a way that the first g components are the restricted ones. We then say 
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that the conditions are given in canonical form (Definition 4.2). We restrict our- 
selves to the case fo = --- = 8-1 = 0, but remember that fo must not further be 
the constant of the regression equation. 

The hypothesis Ho that these conditions hold, that is, 


peo or Bo = ++ = Bg-1 =0 (8.41) 
shall be tested against the alternative { € Q\w. 
Theorem 8.6 If Y in (8.4) is N(X, 0’J,,) distributed, the null hypothesis Ho, 


that (8.41) holds, against the alternative hypothesis that Hp does not hold can 
be tested with the test statistic 


n—k—1 Y?|X(KTX) XT =X (XPM) XT] ¥ 
| Y? (I,-X(X7X)'XT)Y 


If Hp holds, F is in (8.42) central F — distributed with q and n — k — 1 degrees of 
freedom. Xj is the [1 x (k + 1 — g)]-matrix with the last k+ 1 - gq columns of X. 


(8.42) 


Proof: The statement of this theorem follows from Example 4.3. 


This result can be summarised by an ANOVA table (Table 8.2) putting 
B=(X™X) 'XTY and 7=X7 (X7X) 'X7Y. This table is a special case of 
Table 4.1. 

If q = 1, then F = # and if the null hypothesis holds F is the square of a central 
t-distributed random variable with n - k-1 degrees of freedom. In this case 
(8.42) becomes very simple. 


Corollary 8.2 If Y in (8.4) is N(Xf,07I,)-distributed, the null hypothesis 
Ho: f;=0 against Hy: f; 40 (j=0, ...,k) can be tested by the test statistic 
Ss ore 


fj (8.43) 


In (8.43) is the bj = B the (j+ 1)—th component of the estimated parameter 
vector, s the square root of s” in (8.6) and c;; the (j + 1) - th main diagonal 


Table 8.2 Analysis of variance table for testing the hypothesis Ho: $y =f; = --- =fhg-1=0. 
Source of variation SS df MSs Test statistic 
Total y'y n 
Ho: Bo= +++ =Bg-1=9 = YTXB-Y"X\F=Z Zz paetck il Z 

q q N 
Residual Y'Y-Y'XB=N n-k-1 N 
n-k-1 


Regression YX k+1-q 
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element of C = (X7X)!; tif Ho : B; = 0 holds is central t-distributed with n — k-1 
degrees of freedom. 


Proof: We assume that the null hypothesis is in canonical form Ho : Bo = 0. If xo 
is the first column of X and X, the matrix of the k remaining columns of X, then 
X = (xo, X;) and 


T. d: 
xo % XX 
xTx=|~° 
fa aa 


We decompose the symmetric inverse C in submatrices of the same type and 
obtain 


62 & 2 
Co Cro 
(Cj, is a scalar). Then we have 
(X7X1) fe C29 -— Ca Cy Cia, 
and Z=Y7 [x(xTX) OG Ge Y in (8.42) becomes 


Y" (xoCiixg + X1Coixp +x9Ci2X) 1 + X1Cx Cy Ci2X7 1)Y. 


It follows now from (8.5) 


x 
bo =(Ci Ciz) Y 


xt 
or 
By =Y" (xoCuCiixg + XC Ci2Xp +X1Co Cuxy +X1Co CX] )Y. 
Using bg = (b5) " and CZ, = Cy; shows that Z can be written as C;,1b2. Cy, con- 


1 
tains one element coo only so that cog = —. Therefore (8.42) becomes 
Coo 


2 
Coo 82 


or going back to the original hypothesis 
2 
ee 
CijS’ 


and this completes the proof. 
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It is easy to see that under the hypothesis £; = £; the test statistic 


pee 


8.44 
S./Cjj ( ) 
is with nm — k- 1 degrees of freedom central t-distributed. 
To test hypotheses of the form Ho : = f", for which w contains only one point 
and has dimension 0 we need the following theorem. 


Theorem 8.7 If Yin (8.4) is N(X#, o°I,,)-distributed, the hypothesis Ho : 8 = p” 
can be tested against the alternative hypothesis £ 4 #* due to X(X7X) 1X" 
= X(X"X) 'X7X(X'X) 'X" and (8.5) with the test statistic 


n-k-1 (Y-Xp*)'X(XTX) 'XT(Y-Xp*) Ne aee oe ee 
k+l ¥7 (I,-X(X7X)'XT)Y mane?) UO eet 


(8.45) 


F in (8.45) is non-central F-distributed (F(k+1,n-—k-1,4)) with non- 
centrality parameter 


a= O-BY' (XTX) 6-6"). 


Proof: Because for 0 = &* 


nl? et/2 


max L(6,6°|Y) = ——.——_ 
pea 7 IY) (2n)"||¥—0"||" 


holds, Q in (4.18) becomes 


n/2 
v-a¥|}?]” 


Q= 
\|¥ - Xp" ||? 


The orthogonal projection A of R” on Q is idempotent and therefore is 
0* = AO and ||¥ - 6" ||" - || Y- AY||? = (Y- 6") "A(Y- 0"). 
The test statistic F in (4.19) has via Example 4.3 the form (8.45). 


Example 8.4 We consider the simple linear regression of Examples 8.1 and 
use its symbols. We assume that the e; in (8.8) are independent from each other 
and N(0, o”)-distributed. If o” is known, the hypothesis Hp : By = f%, can be tested 
with the test statistic 


A bo- Bo _ bo- Bo ny~(«;-%)” 
Oe ee. Vie 


(8.46) 


Regression Analysis - Linear Models with Non-random Regressors 


and the hypothesis Ho : 2, =; with the test statistic 


Xj ~x)*. 


ge I yi 


O71 (or 


Zo and z, due to Corollary 8.1 are N(0, 1)-distributed if the corresponding null 
hypothesis holds. If o” is not known, it follows from (8.44) that if the hypothesis 
Po =o holds, then 


nS (xi-%) — bo- Bi 
Moe 7 So 


is central t( — 2)-distributed because in Example 8.1 it was shown that in the 
simple linear regression 


Ce ax) a ae 


_ bo- Bo 
“sg 


t 


(8.47) 


ny(m-%) \-Da 4 
that i =e Because c : it follows from (8.44) 
at is, Coo = S . = ———_.,, Ww’ ; 
Ne 2 nS (ai-3) “Dena 
that if Ho : 8; =f; holds, the test statistic 
b,- B; 2 
pe = 8.48 
Pt S(ai-3) (8.48) 


is t(n — 2)-distributed. 

The null hypothesis Ho : fy = 85 (or Ho: P; =f;) is rejected with a first kind 
risk a and the alternative hypothesis f > 5 (or 6, >/}) accepted, if for ¢ in 
(8.47) (or in (8.48)) f>t(u-2 | 1-a). We accept the alternative hypothesis 
Bo <Bo(B1 <B;) if for t in (8.47) (or in (8.48)), t< (1-2 | a). For a two-sided 
alternative hypothesis H4: By Af (or 6, #5), the null hypothesis is rejected 


with a first kind risk a, if with ¢ in (8.47) (or in (8.48)) |e] > t(n-2 | {is 5): 


The hypothesis /; = 0 means that the random variable y is independent of the 
regressor. 

To test the null hypothesis f = f", that is, By = 65, B, =f}, we use the test sta- 
tistic (8.45) of Theorem 8.5, and because 


we receive the test statistic 
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_ (bo Bi)? +25 %(bo~ Bi) (bs Bi)+# DFO Bi)” 


y 
2s” 


(8.49) 


Fisifthe null hypothesis holds central F(2, 1 — 2)-distributed. The null hypothesis 
Ho : By = Bo» B =P; is rejected with the first kind risk a if F > F(2,n-2 | 1-a), 
here F(2, n-2 | 1-a) is the (1 - a)-quantile of the F-distribution with 2 and 
n — 2 degrees of freedom. 

Usually the steps in calculating the F-test statistic are presented in an ANOVA 
table, as already discussed in Chapter 5. We decompose SS-total, that is, 


SSr = 771 (%1-Bo- Bix)”, that is, the sum of squared deviations of the 
observed values of the corresponding values of the regression function if the null 
hypothesis holds 


E(y;)" =Bo + Bix 


into two components. The first component contains that part of SS7, originated 
by the deviations of the estimated regression line = bo + b,x from the regres- 
sion line given by the null hypothesis. This first component is called SS-regres- 
sion(SSpeg;.). The other component contains that part of SS7, originated by 
the deviations of the observed values y; of the values 7, from the estimated 
regression function; this component is called SS — residual (SS,.,). Analogously 
the degrees of freedom are decomposed. The SS after transition to random vari- 
ables are 


SS =~ (¥;-By-Bixi) 
i=1 


“ ok ok 2 kK 
SSRegr. = Se (bo + bx; -Po -f;xi) = 7 Q, (Ge): 
i=l 
SSres = Soi = bo = byx;)°. 

i=l 

Because SSpegr. is the numerator of F in (8.49) and SS, = (1 - 2)s*, the rela- 

tion SS = SSpegr. + SS;es follows. The ANOVA table is Table 8.3. 
Often we are interested to test if two regression equations 


Wii =Pio + By *1i + Cris Vo; = Bo + Poi X2i + €2i, 


with parameters from two groups of 1; and v2 observed pairs (y1;, *1;) and (yo; 
Xj), respectively, have the same slope. The hypothesis Ho : #1; = P21 has to be 
tested. For the model equations for y,; and ¥2;, the side conditions and the addi- 
tional assumptions of the model equation (8.8) may be fulfilled. From (8.9) esti- 
mates b;, for Bj, (i=1, 2) are 
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Table 8.3 Analysis of variance table for testing the hypothesis Ho : By =o, 6, =f;- 


Source of variation SS df Ms F 

Total SSr n 

Regression SSrRegr. 2, SSrRegr. SSRegr. 
2 2s? 

Residual SS yes n-2 s 


Ti pa eyig— Dj atyg DHL 
2 
Ni oj 1 8H ( ajc 1%) 


and from (8.7) estimates bj for Pio are given by 


ba = (i=1, 2) 


bio = 9, — bi Xi (i=1, 2) 
with 
yj; = dja and x; = dja ti d 


ee i=1,2). 
i jo 


We have shown that the b;; are 


(sie) 
DO 7 
Yt (eyes) 


distributed, if the Y; are N (X;f;,071,,)-distributed (i = 1, 2). Under the assump- 
tion that the two samples (1, ,) and (y2;, x2;) are independent from each 
other, it follows that b,,; and b2; are also independent from each other. We 
assume here the independency of both samples. Further we assume that these 
samples stem from populations with equal variances, that is, we have 
01 = 0, =0". Then the difference b,, — bj; with expectation /,, — 2, is normally 


distributed and 


= by; ~ bz - (Bu — 1) 
Sd 


t 


is t(m, + Nz — 4)-distributed, with sz as the square root of 


2 
Sq = 


2 2 
Ti (vy-br0-buxy) + 3721 (¥2)-b20 bai) | 1 1 
ast ~ 42 
ee Dj (ey-#1)° L721 (3-2) 


If the null hypothesis £6; = /2 holds, 
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by; -bo1 
Sd 


t= 


b,-b ete 
is t((n, + Np — 4)-distributed, and t = _ ~?! can be used as a test statistic for the 
Sd 


corresponding ¢-test of this null hypothesis against the alternative hypothesis 
Pir Po (or one-sided alternatives). 

But also in this case, we recommend not to trust in the equality of both var- 
iances, but to use the approximate test with the test statistic 


b,,- 
- 11 by, (8.50) 


with 


2 yOu aes ie 1Vj- by9—bo1%2)” ) 


(m=2) 2" (mh)? * G2) Eye 


* 


Sq = 


and reject Ho, if | f*| exceeds the corresponding quantile of the central 
t-distribution with f degrees of freedom with 


(sj? + siz)? 
f = gi* si* 
ion ge 

(m-2)  (m2-2) 


We give a simple example for 1 =5. 


Example 8.5 For the data of Example 8.3, we will test each of the hypotheses 


Ho : Bip = 30 against Hy : Pi) £30, 
Ho: Pi, =9 against H,:f,, <0, 
Bie 30 30 
Ho: f, = = against H4:/p, 4 ; 
Bu 0 0 
Ay: Py = Poy against Ha: f1; FP, 


with a first kind risk a = 0.05. The one-sided alternative H,: 1; <0 stems from 
the fact that the carotene content cannot increase during storage. 

Table 8.4 is the ANOVA table for this example. MS,,, = 0.922 03 is the esti- 
mate s? of 0°. 

The test statistic for the hypothesis (19 = 30 becomes via (8.47) 


_ 31.215-30 


= 1.72 <t(3|0.975). 
0.7059 
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Table 8.4 Analysis of variance table for testing the hypothesis Ho: 819 =30, 6;,;=0 in 
Example 8.6 for i= 1. 


Source of variation SS df MS F 
Total 393.5733 5 

Regression 390.8072 2 195.50 211.9 
Residual 2.7661 3 0.92203 


For a first kind risk 0.05, the hypothesis (19 = 30 is not rejected. 
From (8.48) we obtain the test statistic for the hypothesis #1, = 0 
0.05455 
~ 0.00394 — 

and this hypothesis is rejected with a first kind risk 0.05. This result was already 
given in the SPSS output of Figure 8.2. In the fifth column we find the value of 
the test statistic, and the sig-value in column six is below 0.05, and this means 
rejection. 

The hypothesis, as we see from the F-test statistic in Table 8.4, is also rejected. 

Finally we test the hypothesis that both (theoretical) regression lines have the 
same slope, that is, the hypothesis /,; = #2. For this we use the test statistic 
given in (8.50) and obtain with f = 6.24 


~ 0.05455 + 0.08098 
t= * =2.17 > t(6.24|0.975), 


V0.0047 + 0.011? 
and the null hypothesis of parallelism of both regression lines (i.e. that the loss of 
carotene is the same for both kinds of storage), is rejected by this approxi- 
mate test. 

The test of the hypothesis that some components of f in (8.4) equal zero is 
often used to find out whether some of the regressors, that is, some columns 
of X in (8.4) can be dropped. 

This method can be applied to test the degree of a polynomial. 

From (8.34) it follows that 6, equals zero if a,=0. By this the hypothesis 
Ho: P,=0 is identical with Hp: a,=0 and can be tested from the corollary of 
Theorem 8.6. 


- 13.85 <t(3 | 0.05), 


Corollary 8.3 (of Theorem 8.6): Let 
Y=Xat+e 


be a quasilinear polynomial regression model of degree k, where X has the form 
as shown in the proof of Theorem 8.3 and a depends on f in (8.33) by (8.34). Let 
Y be N(Xa, o°!,,)-distributed. The hypothesis Ho : a, = 6, = 0 can be tested with 
the test statistic 
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ee yiPki) 
win Pk 

n Be Pi P 

ie 97 st! a PH) 


i=l ji 


(n-k-1) 


Goi -1 Pu(n-k-1) _ 
n k n = 
Pee Ae daj=0 a} ist Pe 


F= 


(8.51) 
if the a; are the components of the MLE @ in (8.35). 


Proof: X has the form given in the proof of Theorem 8.3; XX is a diagonal 
matrix. Because Y? X(X7X)-1X7 Y= Y7X(X7X) 1X? X(X? X) 1 X7Y and from 
(8.35) Equation (8.42) with g = 1 becomes 
a’X?Xa-c! X] Xic 
Y’Y-a"(X'X)a 
Here Xj is the matrix, which arises, if the last column in X is dropped and c is 
the LS estimator if the null hypothesis a, = 0 holds. Further 


F =(n-k-1) (8.52) 


k n 
a'x?xa= oP (8.99) 
j=0 i=l 
and 
k-1 n 
ce’ X} Xic= Soa; ea (8.54) 
j=0 i=l 


and this completes the proof. 


8.4 Confidence Regions 


When we know the distributions of the estimators of the parameters, confi- 
dence regions for these parameters can be constructed. In this section we will 
construct confidence regions (intervals) for the components /; of /, the 
variance o*, the expectations E(y;) and the vector 6 € Q. Again we make the 
assumption that Yis N(Xf, o7I,,)-distributed and the model equation (8.4) holds. 
From (8.44) it follows that 


ate k 1| 5) sca et(n k 1) S) fat a (8.55) 


and due to the symmetry of the t-distribution is 


[bj-t (n-k-1| 1-5) svoj, b+ e(n-k-1| 1-5) syqj] (8.56) 
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a confidence interval of the component /; with a confidence coefficient 1 — a. In 
(8.56) is cj; the j-th main diagonal element of (X7 X)"! and s= Vs? the square 
s*(n-k-1)., 
—__—___~ is 
o 

CS(n - k - 1)-distributed. If y” is a CS(n - k- 1)-distributed random variable 
and y7(n-k-1 | a) and y(n-k-1 | 1-@p) are chosen in such a way that 
with a, + a2 =a, 


P(x? <7°(n-k-l|a1)) =Qy 


root of the residual variance in (8.6). Due to the assumptions 


and 
P(x? >°(n-k-1|1-a3)) =; 
then 


2 ie a 
2% k Ia) s Y) - P(n-k-1]1 aa) }=1-0 


and a confidence interval for o” with a confidence coefficient 1 — a is given by 


s*(n-k-1) s*(n-k-1) 
(8.57) 
YW (n-k-1|l-an)’ x?(n-k-1|a1) 
If we choose a vector x = (xo, ...,xx)/ of the values of the regressor so that 


minx; < xj < maxx; 
L L 


holds for j= 0, ... ,k, then by the Gauss—Markov theorem (Theorem 4.3) an 
estimator ¥ of y =f in the regression function is given by 
y=«'b 


with b in (8.5). 
Now b is N(f, o°(X7X)1)-distributed (independent of s’) so that xb is 
N{x7B, «"(X7X)‘xo"]-distributed. From this it follows that 


s’(n—k-1) 


is N(0, 1)-distributed, and because 5 
oO 


CS(n — k — 1)-distributed the test statistic 
x" (b-f) 


sy/x7 (XTX) lx 


is independent of z as 


t= 
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is t(n — k — 1)-distributed. From this it follows that 


p-e(n-k-a]t- Soyer Ox) 9+ t(n-k-1]1- Soyer xx) 


(8.58) 


is a confidence interval for y = x" with a confidence coefficient 1 - a. The con- 
fidence intervals (8.56) give for each j an interval that covers f; with probability 
1-a. From these confidence intervals, no conclusions can be drawn, in which 
region the parameter vector / rests with a given probability. 

A region in Q, covering # with the probability 1 - a, is called a simultaneous 
confidence region for fo, ... , Bx. With the test statistic F in (8.45) for the test of 
f=", we construct this a simultaneous confidence region. From (8.45) it 
follows 


1 TT 
of ey OP) XxX X(b-f) <F(k+ iyn-k-aj1-a)} = 1-a, 
so that the interior and the boundary of the ellipsoid 

(b—B)" X7X(b—f) = (k + 1)s?F(k + 1,n—k-1|1—a) 


is our confidence region. 


Example 8.6 For Example 8.3 (i=1) confidence regions for {,/,07; 
¥/ = By + B,x and fp" = (Bo,f,) each with a confidence coefficient 0.95 shall be 
found. By SPSS after activating ‘confidence interval’ after pressing the button 
‘Statistics’, we receive Table 8.5 with the confidence intervals for #19 and / 11. 


From (8.57) [0.26;12.82] is a confidence interval for o* with a = a2 = a But 


this is because of the skewness of the y?-distribution, not the decomposition of 
ainto two components, leading to the smallest expected width of the confidence 
interval. 

To calculate a 95%-confidence interval for E(y) in (8.58), we need for some 


Table 8.5 SPSS output with confidence intervals (analogue to Figure 8.2). 


Unstandardized 95.0% confidence 
coefficients interval for B 
Model B Std.error t Sig. Lower bound —_ Upper bound 
1 (Constant) 31.216 0.706 44.223 0.000 28.969 33.462 
time -0.055 0.004 -13.848 0.001 -0.067 -0.042 


Source: Reproduced with permission of IBM. 


Xo € B the values of 
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Table 8.6 95%-confidence bounds for E(y;) in Example 8.6. 


xj Y; Kj Confidence bounds 
Lower Upper 
1 31.16 0.73184 28.92 33.40 
60 27.94 0.56012 26.23 29.65 
124 24.45 0.45340 23.06 25.84 
223 19.05 0.55668 17.35 20.75 
303 14.69 0.79701 12.25 17.12 


x? -2x9 x; + 1x2 
Ko= 2% — 20%) O = y/xT (XTX) 1x, 


n> (x;-%)” 


given in Table 8.6 together with the confidence bounds for E(y;). Figure 8.6 
shows the estimated regression line for i = 1 and the confidence belt, obtained 
by mapping the confidence bounds for E(y) in Table 8.5 and splicing them. 


A confidence region for / = Po is an ellipse, given by 
1 


n(by—Bo)” + 2Ex;(bo—Bo) (b1-B) + Ei? (b1-B)” = 28°F (2,n—-2|1-a). 


! \ 1 ! ! Ped 
0 50 100 150 200 250 300 x 


Figure 8.6 Estimated regression line and confidence belt of Example 8.6. 
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Using the data of Example 8.3 gives 


5(31.25—fy)°—1422(31.215—fy) (0.05455 + B,) + 160515(0.05455 + f,)” = 
1.84406 -9.522 


8.5 Models with Random Regressors 


For random regressors we only consider the linear case. 


8.5.1 Analysis 


Definition 8.7 If x7 = (x1, ...x,, 1) isa (k + 1)-dimensional normally distribu- 
ted random vector and is X = (x;;) (i=1, ...,k+1;j=1, ...,m) a random sam- 
ple of 1 such vectors, distributed as « then equation 


k 
Ij = Xk j= S Bix + &), x= 1 (8.59) 
i=0 

with the additional assumption that e; are independent of each other, N(0, o”)- 
distributed and are independent of the «;;, is called a model II of the (multiple) 
linear regression. Definition 8.4 can be generalised by neglecting the assump- 
tion that Y is normally distributed. Nevertheless, for tests and confidence 
estimation, the assumption is necessary. Correlation coefficients are always 
defined, as long as (8.59) holds, and the distribution has finite second moments. 
To estimate the parameter of (8.59), we use the same formulae as for model I. 
An estimator for p,,) = Ox, /Ox,6y by (5.33), we obtain by replacing o;,,07, and 
o; by the unbiased estimators SxySx, and s; of the covariances and variances, 

respectively. Then we get the (not unbiased) estimator 


a SP, 
i ec (8.60) 


f= 7 
oy Sx;Sy — 4/SSx,SSy 
of the correlation coefficient. 

At first the special case k = 2 is considered. The random variable (x1, x2, x3) 
shall be three-dimensional normally distributed; it can be shown that the three 
conditional two-dimensional distributions f(x; «;|x,) (i Aj #k i,j, k = 1, 2,3) 
are two-dimensional normal distributions with correlation coefficients 

Pig PikP jk aps a 
Pik = s 5 — ; (AJA Ki j,k =1,2,3) (8.61) 
(1= pix) (1— pix) 


Here py, pix and pj are the correlation coefficients of the three two- 
dimensional (normal) marginal distributions of (x;, x;, x,). It can easily be shown 
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that these marginal distributions are two-dimensional normal distributions 
(Exercise 8.1). 

The correlation coefficient (8.61) of the conditional two-dimensional normal 
distribution of (x;, x;) for x, given is called partial correlation coefficient between 
x; and x; after the cut-off of x,. 

The name partial correlation coefficient stems from applications and is of gen- 
eral use even if the name conditional correlation coefficient seems to be better. 

It follows from (8.61) that the value of x, has no influence on the correlation 
coefficient of the conditional distribution of (%;, x;) and therefore pj. is inde- 
pendent of x,. We say that p;.; is a measure of the relationship between x; and 
x; after the cut-off of the influence of x; or after the elimination of x,. This inter- 
pretation of p;;., can be illustrated as follows. Starting with the marginal distri- 
butions of (%;,*,) and (x;,x,) because these marginal distributions are normal 
distributions as conditional random expectations (in dependency on x,) of these 
marginal distributions, we receive 


E (wien) = Mi + Big (%e-Mx) (8.62) 
and 

E (eileen) = Hj + Bix (kas (8.63) 
where ju; = E(x) is the expectation of the one-dimensional marginal distribution 
of x, and #;, and f;, are the regression coefficients of the marginal distributions. 
Calculating the differences, 

dj = dic = iH; -B ix (#k-M) 


and 

dj = dj. = %j—Hj—B je (%K-M)s 
leads to a normally distributed two-dimensional random variable (d;. ;, d;. ;). It 
is to be shown that the correlation coefficient py, 4, is given by (8.61). 

We have 
cov(d,,d; 
Paid; oa ( ’ ‘a (8.64) 
var(d;) var (d;) 


Because cov(d,, dj) = E(d;- d;) — E(d;)E(d;), we obtain 


cov (dj,dj) = E (:,%)) — Milt; — Bix jk — Bir ik + BiB OK 


and 
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Further 
074 = Vat (di) = 07 + Pino; ~ 2p 0; = 0; (1- pix) 
and analogous 
2 2 2 
Oj4 = Var (dj) = 0; (1-pix), 


so that py,4, = Pji-k- 
Between the regression coefficients of x on y and the correlation coefficients 
of the normally distributed random variables (*, y) one has 


0 
Buy =p (8.65) 
Oy 
In the three-dimensional case it can be shown that the relation 


7 Oi-k ’ . oe He 
B= bye (IAI # ksypk =1,2,3) 


Oj-k 
holds, where the multiple (partial) regression coefficients My are the coefficients in 
case k= 2. The Be can be interpreted as regression coefficients between dj. x 


and d_, and are therefore often called partial regression coefficients. The pO 


show, by how many units x; changes, if x; increases by one unit, while all other 

regressors remain unchanged. For the four-dimensional normally distributed 

random variable (x1, #2, #3,%4), we can define a partial correlation coefficient 

between two components for fixed values of the both residual components. 
We call the expression 


Pij-k — Pil-kPjL-k 


Pip-lk = Pij-kl = 
a 7 Pin) (a a Px) 


(GAS AK ALL) K1=1,2,3,4), 


(8.66) 


defined for the four-dimensional normally distributed random variable (x1, *2, 
%3, #4) a partial correlation coefficient (of second order) between x; and x; after 
the cut-off of x, and x;. 
Analogous partial correlation coefficients of higher order can be defined. 
We obtain estimators r;;., and r;;.;, for partial correlation coefficients by 
replacing the simple correlation coefficients in (8.61) and (8.66) by their estima- 
tors. For instance, we get 


Vij —VikV jk 


V(- ri.)(1- Tix): 


Without proof we give the following theorem. 


Vij-k = (8.67) 
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Theorem 8.8 If (x1, ...,%,) is k-dimensional normally distributed and for 
some partial correlation coefficients of s-th order (s=k-2) the hypothesis 
Pij-uy.uy =9 (Uy + »Us ave $= k—-2 different numbers from 1, ... ,k, different 


from i and j) is true, then 


1 Vij-uy...us n—-k 


1-r2 


ij-Uy...Us 


(8.68) 


is t(n — k)-distributed, if m values of the k-dimensional variables are observed. 
Especially for k=3 (s=1) under Ho: py.,=0 


pes Vij-kV n-3 


1- ri, 
is t(n — 3)-distributed and for k= 4 under Ho: pg; . w= 0 
= Vij-KV n-A4 


2 
V1 Fipei 


is t(n — 4)-distributed. 
By Theorem 8.8 for k = 2, the hypothesis p = 0 can be tested with the test sta- 
tistic (8.68). For a two-sided alternative (p £ 0), the null hypothesis is rejected, 


a 
if |¢| >t(n-2|1-2). 
To test the hypothesis Ho: p = p* 4 0, we replace r by the Fisher transform 


1. l+r 
= ~ In——_ 8.69 
ce ae Be ee) 


that is approximately normally distributed with expectation 


1. itp p 
E(z)x=1 
ip omen) 


1 
and variance var(z)~——.. If the hypothesis p=p” is valid, 
n-3 a 


* 


l+p".p =a 
—] ] 3 
" : "1-9 sea) " 


is approximately N(O, 1)-distributed. For large n in place of u also 


1 ok 
n= en f) n-3 


1-p* 
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can be used. An approximate (1 - a). 100% confidence interval for p is 


tan (2 7 tan (z+ ae )| 
Vn-3/)° n—3/) |’ 


with the (1-5) - quantile z-2 of the standard normal distribution 


[P(z>z-) =§]. 


A sequential test of the hypothesis p = 0 was already given in Chapter 3 using 
1. ltr, +r 
z= — In—— in place of z= In——. 
2 I1-r 1l-r 
To interpret the value of p (and also of r), we again consider the regression 
function f(x) = E(y|x) = a + ax. p? can now be explained as a measure of the 
proportion of the variance of y, explainable by the regression on x. The condi- 


tional variance of y is 


var(y|x) = 03 (1p?) 


and 


is the proportion of the variance of y, not explainable by the regression on x, and 
by this the statement above follows. We call p? = B measure of determination. 

To construct confidence intervals for {iy and /, or to test hypotheses about 
these parameters seems to be difficult, but the methods for model I can also be 
applied for model II. We demonstrate this as example of the confidence interval 
for Po. The argumentation for confidence intervals for other parameters and for 
the statistical tests is analogue. 

The probability statement 


P|bp-t(n-2]1-5) so < Py < bdo +4(n-2|1- =) so] =1-a, 


leading to the confidence interval (8.56) for j = 0 is true, if for fixed values x} , 
.. »X, samples of y-values are selected repeatedly. Using the frequency inter- 
pretation, fo is covered in about (1- a) - 100% of the cases by the interval 
(8.56). This statement is valid for each arbitrary n-tuple x;), ... ,x;,, also for 
an n-tuple x;), ... ,%j,, randomly selected from the distribution because 
(8.56) is independent of x,, ... ,x,, if the conditional distribution of the y; 
is normal. But this is the case, because (y, x1, ... ,%%) was assumed to be nor- 
mally distributed. By this the construction of confidence intervals and testing 
of hypotheses can be done by the methods and formulae given above. But the 
expected width of the confidence intervals and the power function of the tests 
differ for both models. 
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That [b, -t (1 - 2| 1- 5) 8;,b; +t (x -2| 1- 5) si is really a confidence interval 


with a confidence coefficient 1 - a also for model I can of course be proven 
exactly, using a theorem of Bartlett (1933) by which 


Sf/n-2 
t; = —— 5 (b;-B;) 
s;— b; 82, 


is t(n — 2)-distributed. 


8.5.2 Experimental Designs 


The experimental design for model II of the regression analysis differs funda- 
mentally from that of model I. Because x in model II is a random variable, 
the problem of the optimal choice of « does not occur. Experimental design 
in model II means only the optimal choice of 1 in dependency of given precision 
requirements. A systematic description about that is given in Rasch et al. (2008). 
We repeat this in the following. 

At first we restrict in (8.59) on k= 1 and consider the more general model of 
the regression within of a =1 groups with the same slope /,: 


Inj =Bno + Pyxnj + Cnj (1 =1,..,a;j=1,...,Mn 2 2). (8.70) 
We estimate /, for a> 1 not by (8.9), but by 


bn = Duel sp) = SP ixy 
4-188 — SSrx 


(8.71) 


with sp) and ss”) for each of the a groups as defined in Example 8.1. 


If we look for a minimal n=)~/_,n, so that V(bi,)<C, we find in 
Bock (1998) 


sa | 
n-a-2= 1 mn 
If in (8.59) for k = 1 for the expectation E(y|x) = Bo + P1x a (1 — a)-confidence 


interval is to be given so that the expectation of the square of the half of the 
width of the interval (8.58) (for k= 1) does not exceed the value d’, then 


n 3-(5 hi saa (80H) (1 -m)*]@ (n-2)1-5) | (8.72) 


must be chosen. 
The theorem of Bock (1998) is important and given without proof. 
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Theorem 8.9 The minimal sample size n for the test of the hypothesis 
Ho: Pi = fio with the t-test statistic (8.48) should be determined so that for 
a given first kind risk @ and a second kind risk # not larger than f* as well 
as |$,—f19| <d is given by 


A(zp +2)” 


(8.73) 


~ 2 
dox. dox 
Inj 1+ Inj 1 
( ( “| ( iiss) | 
Here is P= 1-a for one-sided and P = 1 — a/2 for two-sided alternatives. 


Concerning the optimal choice of the sample size for comparing two or more 
slopes (test for parallelism), we refer to Rasch et al. (2008). 


8.6 Mixed Models 


If the conditional expectation of the component y of an r-dimensional random 
variable (y,%%_ 42) ---» x) is a function of k—r further (non-random) regres- 
sors, in place of (8.59) 


k-r+1 


k 
= x Bixg + S> Bixey +e, Xj =1 (8.74) 


i=k-r+2 
must be used. 


Definition 8.8 Model equation (8.74) under the assumption that e; are 
independent of each other and of the x;; and N(0, o°)-distributed, and the 
vectors xj = (Yj Xx +42, jr +++» Xk,j) are independent of each other and N(y, £)- 
distributed with the vector of marginal expectations 


k T 

-r+1 

we = ( S- Prorat) 
i=0 


It is called mixed model of the linear regression (|X| 4 0). 


Estimators and tests can formally be used as for model II. The problem of 
the experimental design consists in the optimal choice of the matrix of the 
xi=0, ...,k-r+1;j=1, ...,m) and the optimal experimental size n. Results 
can be found in Bartko (1981). 
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8.7. Concluding Remarks about Models 
of Regression Analysis 


Because the estimators for #9 and /, for model I and model II equal each other 
and tests and confidence intervals are constructed by the same formulae, the 
reader may think that no distinction between both models has to be made. 
Indeed in many instructions for the statistical analysis of observed material 
and in nearly all program packages, a delicate distinction is missed. But the 
equality of the numerical treatment for both models does not justify to neglect 
both models in a mathematical treatment. Further, between both models there 
are differences that must also be considered in a pure numerical analysis. We 
describe this shortly for k = 1. 
1) In model I only one regression function is reasonable: 
E(y) =ao + ax. 
But for model II two regression functions are possible: 
E(y|x) =a9 + ay x and E(xly) = fo + Py. 

For model II the question, which regression function, should be used arises. If 
the parameters of the regression function are estimated to predict the values of 
one variable from observed values of the other one, we recommend using that 
variable as regressand, which should be predicted. That is, because the corre- 
sponding regression line by the least squares method is estimated, the deviations 
parallel to the axis of the regressand are squared and the sums are minimized. 

But if the two-dimensional normal distribution is truncated in such a way that 
only for one variable the region of the universe is restricted (in breeding by 
selection concerning one variable), that variable could not be used as regres- 
sand. We illustrate this by an example. 


Example 8.7 We consider a fictive finite universe, as shown in Figure 8.7 with 
linear regression functions f(x) and g(y). If we truncate with respect to x (regres- 
sor), the samples stem from that part of the population where x > 3. For simpli- 
fication, we assume that the sample is the total remaining population. Then the 
regression function is identical with the regression function for the total uni- 


1 1 
verse ( =0,a, = 5) and given by the function E(y|x) = 3% Truncation with 


respect to y, leads to different regression lines. Truncation for y>3 (y< — 3) is 
shown in Figure 8.7 and leads to the regression functions 


Si(x) =E(y|x, y > 3) = 3.25 + 0.254 
f(x) = E(y|x, y < —3) =-3.25 + 0.25, 


respectively, with wrong estimates a@ = 0 and q@, = 0.5. 
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4 6 8 


Figure 8.7 Fictive population with truncation shown in Example 8.7. 


The example shows that truncation with respect to the regressand results in 
inacceptable estimates, while truncation with respect to the regressors causes 
no problem. 


2) While for model I the estimators by and b, are normally distributed, this is 
not the case for model II. 

3) The confidence intervals for both parameters are calculated by the same for- 
mulae, but the expected width of these intervals differs for both models. 

4) The hypotheses for both models are tested by the same test statistic, but nev- 
ertheless the tests are different because they have different power functions, 
leading to different sample sizes. 

5) In case of model II, the regression analysis can be completed by calculating 
the correlation coefficient. For model | it may also be calculated, but cannot 
be interpreted as a statistical estimate of any population parameter and 
should be avoided. That the calculation of a sample correlation coefficient 
(as done by program packages) for model I is unreasonable follows from 
the fact that its value can be manipulated by a suitable choice of the x;. 

6) In the experimental design of model I, although the optimal choice the 
matrix X is important for model II, only the optimal choice of the sample 
size is important. 
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We considered only the most important models of the linear regression. Mod- 
els with errors in the regressor variables are not included in this book. Models 
with random regression coefficients # as occurring in the population mathemat- 
ics if each individual has its own regression coefficient are discussed in Swamy 
(1971) and Johansen (1984). 


8.8 Exercises 


8.1 Derive Equations (8.9) and (8.10) using the partial derivations of S given in 
the text before these equations. 


8.2 Prove Corollary 8.1. 


8.3 Estimate the parameters in the quasilinear regression model: 
y; = Po + B,cos(2x) + BIn(6x) + e; (i=1,...,7). 
8.4 Calculate in Example 8.3 all estimates for the storage in glass with SPSS. 


8.5 Determine for Example 8.3 the G- and D-optimal design in the experi- 
mental region and calculate the determinants |X£ Xg| and |X} Xp]. 
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In this chapter estimates are given for parameters in such regression functions 
that are non-linear in x € B C R and are not presentable in the form (8.19). We 
restrict ourselves to cases of real-valued regressands x; generalisations for 
vectoriale x are simple. 


Definition 9.1 Regression functions f(x, 6) in a regressor x € B C R and with 
the parameter vector 


0=(O1,.50), OEQCRY, 


which are non-linear in x and in at least one of the 6; and cannot be made linear 
or quasilinear by any continuous transformation of the non-linearity parameter 
are called intrinsically non-linear regression functions. Correspondingly we 
also say intrinsically non-linear regression. More precisely, supposing f(x, @) 
is concerning @ differentiable and 


of (x,0) 
00 


is the first derivative of f(x,0) with respect to 0, the regression function is said to 
be partially non-linear, if 


of (x,0) 


59 CDRH)» P= (Bi,r--18%,)" (9.1) 


and 0 <r< yp where C(8) is a (p x p)-matrix not depending on x and g, chosen in 
such a way that r is minimal (r = 0 means a quasilinear regression). If r = p, then 
JA% 9) is called completely non-linear. 

6;,(7 = 1,....7) are called non-linearity parameters, and the other components 
of @ are called linearity parameter. 
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We illustrate the definition by some examples. 


Example 9.1 We consider 


O1x 
Ff (%,8) = (1-x) [2 + (1-02) xx] 


That is, we have p = 2,0 = (0, 05)" 


and obtain 
x x 
af (x0) | (-*)[@2+(-@)a] | 1 0 (1-«) [02 + (1-02) 
ao =| me LO es ae 
[02 + (1-02)x]” [02 + (1-02)]° 


Here 0, is a linearity parameter and g = 92 a non-linearity parameter. Further 
r=1. 


Example 9.2 We consider 
Ff (x,0) = 0) (x +e) -Onxe7*, 


that is, we have 07 = (6,, 03,03) and p = 3 and obtain 


5 x +79 10 0 x+e 3 

0 

ae Ye — xe 3% -{|0 1 0 —xe~ 93% 
(-x0) + x70, )e~ * 0 A A xen Os 


y = 93 is a non-linearity parameter, and 0, and 0, are linearity parameters (i.e. 
r=1). 

For linear models a general theory of estimating and tests was possible, and by 
the Gauss—Markov theorem, we could verify optimal properties of the least 
squares (LS) estimator 8. A corresponding theory for the general (non-linear) 
case does not exist. For quasilinear regression the theory of linear models, as 
shown in Section 8.2, can be applied. For intrinsically non-linear problems 
the situation can be characterised as follows: 


e The existing theoretical results are not useful for solutions of practical pro- 
blems; many results about distributions of the estimates are asymptotic: for 
special cases simulation results exist. 

e The practical approach leads to numerical problems for iterative solutions; 
concerning properties of distributions of the estimators, scarcely anything 
is known. The application of the methods of the intrinsically non-linear 
regression is more a problem of numerical mathematics than of mathematical 
statistics. 
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e Acompromise could be to go over to quasilinear approximations for the non- 
linear problem and conduct the parameter estimation for the approximative 
model. But by this we lose the interpretability of the parameters, desired in 
many applications by practitioners. 


We start with the model equation 
Y=nt+e (9.2) 


with the side conditions E(e) = 0, (E(Y) = 7) and var(e) = 071, (o* > 0). Y,n and e 
are vectors with components y,, 7;and e; (i= 1, ..., 1), respectively, where 7; are 
intrinsically non-linear functions 


ni=f(%ir0), 9€QCR? (i= 1,0) (9.3) 


in the regressor values x; € B C R. We use the abbreviations 


(8) = (f (%1,0)s--f (m9) "> 


flout) == — filo) = Ee, 
VS (nb) caso a8) )s (9.4) 
Sete eae 


Kj = Ki(9) = (kjk (%in9)) = (ae ’), 


and assume always that n > p. Further 


R(O) = ||¥-n(0 P= obs (x;,0)). (9.5) 


The first question is whether different values of 0 always lead to different para- 
meters of the distribution of Y — in other words, whether the parameter is iden- 
tifiable. Identifiability is a necessary assumption for estimability of @. But the 
identifiability condition is often very drastic in the intrinsically non-linear case, 
so that we will not discuss this further. Instead we choose a pragmatic approach 
that also can be used if identifiability is established. 


Definition 9.2. The random variable 6 is called LS estimator of 0, if its realisa- 
tion @ isa unique solution of 


R(O) = minR(6). (9.6) 


In (9.6) R(@) is given by (9.5). Further let 


=f (x8) 
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be the LS estimator of 7 = f(x, 0). In place of (9.6) we also write 


O= arg min R(@). 


We always assume for all discussions that for (9.6) a unique solution exists. 
The estimators of the parameters by the LS method for the intrinsically non- 
linear regression function are in general not unbiased. The exact distribution 
is usually unknown, and therefore for confidence estimations and tests, we have 
to use asymptotic distributions. 

The possibility to approximate function (9.3) by replacing f(x, 0) by a quasi- 
linear function was discussed in the literature (see Box and Lucas, 1959; Box and 
Draper, 1963; Karson et al., 1969; Ermakoff, 1970; Bunke, 1973; Petersen, 1973). 
For instance, a continuous differentiable function f(x, 9) could be developed in a 
Taylor series stopping after a designated number of terms and choosing the 
design (the x;) in such a way that the discrepancy between f(x, 0) and the approx- 
imate quasilinear function is a minimum. The approximate function can then be 
estimated by the methods of Chapter 8. 

We now assume that the experimenter is interested in the parameters of a 
special intrinsically non-linear function obtained from a subject-specific differ- 
ential equation. We must therefore use direct methods even if we know only few 
about the statistical properties of the estimators. 

For the case that f(x, 9) concerning @ is a continuously differentiable function, 
we obtain by zeroing the first partial derivations of (9.5) with respect to the com- 
ponents of 0 


R,(4)n, (8) [¥ -n(4)] =0 (9.7) 
with 

R,(@) = oA and 1;(9) = [fi(x1,8) of (%n)]" 
and 

(eo) = T0 


9.1 Estimating by the Least Squares Method 


At first we give numerical methods for an approximate solution of (9.7) or 
(9.6). The existence of a unique solution is from now on assumed. In 
Section 9.1.2 we give methods without an exact knowledge of the function 
to be minimised and without using the first derivations. In Section 9.1.3 we 
present methods applied directly to the differential equation and not to its inte- 
gral f(x, 0). 
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9.1.1 Gauss—Newton Method 


We assume that f(x, 6) is twice continuously differentiable concerning 0 € Q and 
that for a given x exactly one local minimum concerning 0 exists. We develop 
fix, 9) around 4 € 2 in a Taylor series stopping after the first-order terms. If 
I(x, Ao) is the value of f(x, 0) at 6 = Oo, then 


of (x,0 ~0 
f(0) xf 00) + (0-00) i, =F"). (98) 
We approximate (9.2) written in its realisations starting with / =0 by 
Ai i> a) 70) 2 
¥=7+e, 7 = if (1,0) yenf (%n»6)] (9.9) 


Equation (9.9) is linear in 6 — 0; = AG; (J = 0). The Gauss—Newton method means 
that Ap in (9.9) is estimated by the LS method, and from a Taylor series expan- 
sion and with the estimate Ao, we build up the vector 0; = 0 + VoAp. Around 
8, once more a Taylor series analogously to (9.8) is constructed, and model (9.9) 
is now used with /= 1. 

Now A@, = 6 - 6; has to be estimated by the LS method. If 6 is near enough to 


the solution 6 of (9.7) (in the algorithm of Hartley below, it is exactly explained 


what ‘near enough’ means), the sequence po, 0;,... converges against 0. If we 
however start with a bad initial vector 0, a cutting of a Taylor series expansion 


after the first terms leads to large differences between fand f, and the method 
converges not to the global minimum but to a relative minimum (see 


Figure 9.4). If 6,;€.@ is the vector in the /-th step of a Taylor series and 


~(I 
f (8) is determined analogously to (9.8), then in the /-th step the simultane- 


ous equations become 


FT FA) = FF (v -#") (9.10) 
with F; = (@) = io (xi8)). We assume that x; are chosen in such a way that 


FFF, is non-singular and by this that (9.10) has a unique solution. The iteration 
method calculating vectors for a Taylor series by 


A141 =) + ViAO (9.11) 
can be (convergence assumed) continued as long as for all j 

|6;,2-1- | <5) (4 = (115.9p1)” ): 
But because the objective of the iteration is the solution of (9.6), it makes more 
sense to continue the iteration with 6, and 0; for @ in (9.6) as long as 

|R(O;) —R(O141)| <€ 


is reached for the first time. 
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In the original form of the Gauss—Newton iteration, v;=1 (/=0,1,...) was 
used. But then the convergence of the procedure is not sure or can be very slow. 

Some proposals are known to modify the Gauss-Newton method, for 
example, by Levenberg (1944) and Hartley (1961), and the latter will 
be described now. 

The method of Hartley offers a quicker convergence and has the advantage 
that assumptions for a reliable convergence can be formulated. 

Let the following assumptions be fulfilled: 


V1: flx, @) has for all x continuous first and second derivatives concerning 0. 

V2: For all @) € Qo C Q (Qo restricted, convex) with F = [7,(6), ...,7,(0)], the 
matrix F’F is positive definite. 

V3: There exists a 09 € Qy) C Q so that 


R(@) < petit ROP): 


Hartley’s modification of the Gauss-Newton method means to choose v; in 
(9.11) so that R(O,;+ v,A@,) for a given 9; as a function of v; for O< vj <1 isa 
minimum. 

Hartley proved the following theorems, and we give them without proof. 


Theorem 9.1 (Existence theorem) 
If V1 to V3 are fulfilled, then a subsequence {0,,} of the sequence {6;} of vectors in 
(9.11) always exists with v, that minimises R(@, + v,A@,) for given 0, and 0<v; <1 
that converges against a solution of 
0°) = mi 0). 

BE = TOR) 
For restricted and convex Qo, by this theorem, Hartley’s method converges 
against a solution of (9.6). 


Theorem 9.2 (Uniqueness theorem) 
If the assumptions of Theorem 9.1 are valid and with the notations of this 


_ oR(0) 
00,00; 
tive definite in (Qo, then there is only one stationary point of R(@). 


section the quadratic form a7 Ra with R = (Rj(0)) and Ri(0) is posi- 


A problem of Hartley’s method is the suitable choice of a point Op in a 
restricted convex set Qo. The numerical determination of v;, 1 is often elabo- 
rate. Approximately v;,, can be found by quadratic interpolation with the 

sk 


1 
* = a _ 
values v7, , =0, 7%, = 5 and v7", =1 from 


1 1 R(O;) -R(A + Ad) 
Vi41 = 5 + 


7 ; (9.12) 
RO + AG;) —2R (26 + 54%) + R(O,) 
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Further modifications of the Gauss—Newton method are given by Marquardt 
(1963, 1970) and Nash (1979). 


Program Hint 
With SPSS an LS estimator can be obtained as follows. 

A data file with (x, y)-values, as for instance that for the growth of hemp plants 
given by Barath et al. (1996), is needed and shown in Figure 9.1. The value 20 for 
age was added for later calculations and does not influence the parameter esti- 
mation, due to the missing height value. 

We choose at first 


Regression 
Nonlinear 


and get the window in Figure 9.2. 

In this window first the parameters with their initial values must be put in, and 
then the regression function must be programmed. We choose at first the logis- 
tic regression from Section 9.6.3. The programmed function and the initial 
values of the parameters are given in Figure 9.3. 

After many iterations an unsatisfactory result with an error MS of 506.9 as 
shown in Figure 9.4 came out. 

With a bad choice of the initial values, we get a relative (but no absolute) min- 
imum of R(@). 


A “growth.sav [DataSet!] - IBM SPSS Statistics Dato Edita - 6 
fue Ect View Data Transform Anata Graphs_Utimes Extensions Window Help 


ane hes HLA Ae BAe 4Oe 4%) 


15 age Visible. 4 of 4 Variables 


IBM SPSS Statistics Processor is ready Unicode:ON 


Figure 9.1 Data file of hemp data with additional calculated values (explained later). 
Source: Reproduced with permission of IBM. 
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a Nonlinear Regression x 


Eunctions and Special Variables: 


(ox) ss (eset) (Cancel (eins) 


Figure 9.2 Start of non-linear regression in SPSS. Source: Reproduced with permission of IBM. 


he] Nonlinear Regression x 


pp a 
Star 
— 


eee 
opto.) 


Function group: 
All 


CDF & Noncentral CDF 
Conversion 
Current Date/Time 


EXP(numexpr). Numeric. Returns e raised to the 
power numexpr, where e is the base of the natural 
logarithms and numexpr is numeric. Large values of 
numexpr may produce results that exceed the capacity 
of the machine. 


Figure 9.3 Programmed function and the initial values. Source: Reproduced with permission 
of IBM. 
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fA “Ourputt [Documentt] - GM SPSS Statistics Viewer - 6 
inset Format Anaize Graphs _Usimes Extensions Hei 


— Mea RRS 2e aR 7 Er B¢e> +- BBUDhse 


Parameter Estimates 


ia 


Parameter Estimate $10, Eto1 Lower Sound Upper Bound 
003 773 1097 1704 
1.000 020 1063 7 
4516-6 ont 002 092 


Correlations of Parameter 
Estimates 


[= __) 


IBM SPSS Statistics Processor is ready \Unicode.ON 


Figure 9.4 Result of the calculation with the initial parameter values from Figure 9.3. 
Source: Reproduced with permission of IBM. 


We now choose other initial values by help of the data. For x = oo the logistic 
function reaches for negative y the value a; we replace it by the rounded max- 


122 
imum 122. If the growth starts (x = 0), the value of the function is about —— isp 
and we replace this by the rounded smallest value 8 and receive / = 14.25. Finally 
we choose y = -0.1 and change the initial values correspondingly. 

Now we gain in Figure 9.5 a global minimum with a residual MS error of 3.71, 
126.22 
1+ 19.681e-°-46*" 
Another possibility is to try some function already programmed in SPSS. For 
this we use the commands 


and the estimated logistic regression function is 


Analyze 
Regression 
Curve Estimation 


We receive Figure 9.6 where we already have chosen the cubic regression (logis- 
tic does not mean the function used before), and with save we gave the com- 
mand to calculate the predicted values FIT and the error terms RES, which 
are already shown in Figure 9.1. 

In Figure 9.7 we find the graph of the fitted cubic regression with extrapola- 
tion to age 20. 

The MS error is the sum of squares of the residuals divided by df= 10 = 14 - 4, 
because four parameters have been estimated in the cubic regression. We 
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TR “Ourputt [Document] - GM SSS Statistics Viewer 
ae Got View Dats Transform insert Format Anaiae Graphs Uitimes Extensions We 


aot om hee BE 


wl 
& 
LI 
+ 
~ 
+ 
1 
. 
La 


@ & Output 
Boo 
B & Nonines Regression 
(tite 
Bpvotes 
UB Active Dataset 
i eration History 


Parameter Estimates 


450 O16 408 424 


Be 

2G Nonlinear Regression 
+t. 
Bnvotes 
(gy leration History alpha bata gamma 
GParamenresimd| = pra) 1080 =e 
Gy Conretatons of: - 
Gy aiova beta -a78 1.000 908 


Residual 
Uncorrscse Total ——94985.150 14 

22898349 13 
velgnt 


aR squared = 1 - (Residual Sum of Squares) /(Comected 
‘Sum of Squares) = 998. 


1M SPSS StatisScs Processor is 10ady 


Unicode. ON 


Figure 9.5 Result of the calculation with improved initial parameter values. 


Source: Reproduced with permission of IBM. 


ta Curve Estimation 


Dependent(s): 


@ Fitfor height with ag... (9) # height 
@ Error for height with ... 


(o<_) (gaste |(weset j( cancel) (_Hetp_) 


Independent 
© Variable: 
») 
OTime 
Case Labels: [M Include constant in equation 
eto 
r Models 
© Linear ©) Quadratic [| Compound [)] Growth 
[) Logarithmic Mj Cubic [JS Exponential 
J Inverse [-) Power: [[] Logistic 
( Display ANOVA table 


Figure 9.6 Start of curve estimation in the regression branch of SPSS. Source: Reproduced 


with permission of IBM. 


receive MS error = 7.16. This is larger than the corresponding value 3.71 in 


Figure 9.6, and it seems that the logistic regression makes a better fit to the data. 


But even if the MS error for the cubic regression would be smaller than that of 


the logistic regression, one may have doubts about using the latter. From the 
graph we find that extrapolation (to an age of 20) gives a terrible result for 


the cubic function that can be seen in the graph as well as in the predicted value 
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Height 


120.00 


o Observed 
— Cubic 


100.00 


80.00 


60.00 


40.00 


20.00 


0 5 10 15 20 
Age 


Figure 9.7 Graph of the fitted cubic regression with extrapolation to age 20. 
Source: Reproduced with permission of IBM. 


-15.0 in Figure 9.1. The predicted height for the logistic regression at age 20 is 
129.3, a reasonable result. The reason is that the logistic regression is an integral 
of a differential equation for growth processes. 


9.1.2 Internal Regression 


The principle of internal regression starts with Hotelling (1927). Later Hartley 
(1948) developed for the simple intrinsically non-linear regression with equidis- 
tant x; a method for the case that f(x) is integral of a linear differential equation 
of first order. The observed values y; are not fit to the regression function but 
approximatively to the generating differential equation by approximating the 
differential quotient. This method has later been extended to intrinsically 
non-linear regression functions, which are integral of a linear homogeneous dif- 
ferential equation of higher order with constant coefficients, and to non-linear 
differential equations (Scharf, 1970). 

We restrict ourselves in the following outline to the generalised method of 
internal regression for homogeneous linear differential equations of order k 
of the form 


k 
Pick ae =0 (k>0,integer) (9.13) 
I=] 
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als («) 


7 unknown real b, and a k-times continuously differentiable 


with f = 


function f(x, 0) = f©. (W.Lo.g. the absolute term was omitted.) To obtain a gen- 
eral solution of this differential equation, we need at first the roots of the char- 
acteristic equation 


k 
eS br! =0. (9.14) 
l=1 


Each real root r* of (9.14) with the multiplicity v corresponds to the v solutions 

xe™* (1=0,...,v-1) (9.15) 
of (9.13). Because b; must be real, complex roots of the characteristic equation 
can only be pairwise complex conjugate. We consider here only such cases 
where (9.14) has only simple real roots 7), ...,7,, so that the general solution 
of (9.13) can be written as a linear combination of the special solutions (9.15) 
with real coefficients c,{i= 1, ..., t) of the form 


f (0) =f (x) = See" (0 = (Creertetievnte)” ). (9.16) 
l=1 


We now have to estimate the coefficients b; in (9.13) in place of the para- 
meters of f(x, 0) = f(x) in (9.16) and use the stochastic model 


FO) + Sob) Om) = (i=1,...,.n), n>t. (9.17) 
l=1 


We assume that the vector e; is N (0, 6°I,)-distributed and we wish to deter- 


mine the LS estimators b; of the b, in (9.17). Of course we now consider a model 
different from (9.2), and we now assume additive error terms in the differential 
equation and not, as in (9.2), for the integral. The applicability of the internal 
regression depends on the tenability of model (9.17) (at least approximately). 
by are calculated so that 


n t n 


S- tne be = min fers SoM 
f=] f=] 


T — 00 <b < 0 AF 


(9.18) 


holds. 
To replace the differential quotient by a difference quotient, we need the fol- 
lowing notations assuming x) <%2<--:: 


A? = Vis 
ae Ae A 
: Xi41 ~Ki-1 (9.19) 
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y; are the observed values or in case of multiple measurements the means of the 
observed values at x;. From (9.18) we obtain by derivation with respect to b; and 
zeroing these derivatives and by 


f (o;) 2 t (9.20) 
the approximate equations for by: 
n-t t Bs 
ry (a1 Sohal at-0 (BS he). (9.21) 
i=t+1 [=1 


These linear simultaneous equations in by can easily be solved. From (9.14) and 


with b;, we obtain estimates /,...,7; for r and correspondingly by (9.16) the gen- 
eral solution 


t 
flxi) = Soae’™. 
t= 


With z; = e”*' the c, are estimated as regression coefficients of a multiple linear 
regression problem with the model equation 


t 
y= docu +e. (9.22) 
al 


Transition to random variables gives 


0 = (Cys by Pete)” 


as estimator of @ by internal regression. 


9.1.3. Determining Initial Values for Iteration Methods 


The convergence of iteration methods to minimise non-linear functions and to 
solve non-linear simultaneous equations depends extremely on the choice of 
initial values. If the parameters of a function can be interpreted from a practical 
application or if the parameters can roughly be determined from a graphical 
representation, then a heuristic choice of an initial value 09 of 9 can be reason- 
able. But if we only have the (y;, x;) values in our computer, we can try to find 
initial values by some specific methods. Unfortunately some of those methods 
like Verhagen’s trapezial method (Verhagen, 1960) (Section 9.6) are applicable 
only for special functions. A more general method is the ‘internal regression’ 
(Section 9.1.2) that is recommended to determine initial values. 


1 a 
The residual variance o” is either estimated by s? = ap’ (9) or by 


Ts 7 
= ak (). The motivation is given in Section 9.4.2. 
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9.2 Geometrical Properties 
The problem of minimising R(@) in (9.5) will now be discussed geometrically. 


For a fixed support (x), ..., x,) in (9.4), the function 7(8) defines an expectation 
surface in R”. 


9.2.1 Expectation Surface and Tangent Plane 


First we give the following definition. 


Definition 9.3. The set 
ES = {Y*|30: Y* =n(0)} (9.23) 


is called the expectation surface of the regression function n(@) in R”. 

If o > 0, an observed value Y with probability 1 does not lie in ES. 

From Definition 9.2 we may conclude that the LS estimate 0 is just that value 
in Q that has a minimum distance between Y and 6 or in other words f (x, 0) =f 
is the orthogonal projection of Y on ES. 

The distance is the length of the vector orthogonal to the tangent plane of the 
expectation surface at the point 7(@) and has its (not necessarily unique) min- 
imum at n(0). 


Example 9.3. Let 6€ Q=R’, that is, p = 1, x€ R’, and let us consider the 
function 


10 
0) = ——__.. 24 
f (0) = (9.24) 
Further let 1 = 2, x, =1,%2 =2. For 
10 
M1 1+2¢? 
Y= and (0) = ; 
(” : 
1+ 220 


we consider the four cases: (7/11; y21) = (45 8) ; (125 y22) = (7; 3) 3 (133 23) = (1.25; 
2.5) and (y143 Ya) = (1.25; 1.5). 

For each case (i = 1, 2, 3, 4), R(@ly1s y2;) was calculated as a function of 0 
(Table 9.1), and the graphs of these four functions are shown in Figure 9.8. 

The coordinates of the expectation surface in R’, which because p = 1 is an 
expectation curve (and the tangent plane is a tangent), are shown in 
Table 9.2. From Definition 9.3 it follows that the expectation surface does 
not depend on the observations. The expectation curve, observed points 
Y; (i = 1, 2, 3, 4) and two tangents are shown in Figure 9.9. 


Table 9.1 Values of the functionals R(@) for four pairs of values of Example 9.3 (values > 30 


are not included). 


0 R(A|y11, Yar) 
-1.5 9.69 
-14 8.1 
-1.3 6.61 
-1.2 5.24 
-1.1 4.05 
-1.0 3.12 
-0.9 2.53 
-0.8 2.37 
-0.7 2.73 
-0.6 3.68 
-0.5 5.28 
-0.4 7.54 
-0.3 10.45 
-0.2 13.94 
-0.1 17.91 
0 22:22 
0.1 26.75 
0.2 

0.3 

0.4 

0.5 

0.6 

0.7 

0.8 

0.9 

1 

1.2 


The @ scale at the expectation curve can be found in the first column of 
Table 9.2. For in @ non-linear regression functions fy, 9) the expectation surface 
is curved, and the curvature depends on @ and in an environment LU(0o) of Op it 
can be defined as the degree of the deviation of the expectation surface from the 


R(6|y12 ¥22) 


27.9 
25.3 
22.6 
20.0 
17.6 
15.5 
13.8 
12.6 
11.9 
11.9 
12.5 
13.6 
15.1 
17.0 
19.2 
21.5 
23.9 
26.4 
28.7 
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26.4 
21.3 
16.8 
12.9 
9.60 
7.00 
5.03 
3.64 
2.74 
2.23 
2.03 
2.06 
2.23 
2.51 
2.85 
3.21 
3.57 


R(6|y13, ¥23) 


RO|yia, Y2a) 


28.8 
23.3 
18.4 
14.1 
10.6 
7.70 
5.45 
3.76 
2.54 
1.70 
1.16 
0.85 
0.71 
0.68 
0.74. 
0.84. 
1.31 
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R(Olg) 4 


R(017;3) 
30+ ~ R148) 
\ ; / 
\ : : / 
\ \ 1 
25 + \ \ ff 
\ / 
\ / 
\ / 
20+ \ / 
\ / 
x \ WA 
15; \ 7 
10+ he 
5+ oe ; R(011.25;1.5) 
Me at eo -R(611.25;1.5) 
1 1 1 1 1 1 1 pe 
15-10 -05 0 O5 10 15 0 


Figure 9.8 Graphs of R(6,Y) from Table 9.1. 


Table 9.2 Coordinates of the expectation curve of Example 9.6. 


‘ 10 10 
1+2e? 1+ 2e2? 
-1.9 7.70 9.57 
-1.7 7.32 9.37 
-15 6.91 9.09 
-1.3 6.47 8.71 
-1.1 6.00 8.18 
-0.9 5.52 7.52 
-0.82 5.32 7.20 
-0.7 5.02 6.70 
-0.5 4.52 5.76 
-0.3 4.03 4.77 
-0.24 3.89 4.47 
-0.1 3.56 3.79 
0 3.33 3.33 
0.2 2.90 2.51 
0.4 2.51 1.83 
0.44 2.44 1.72 
0.6 2.15 1.31 
0.78 1.86 0.95 
1.0 1.55 0.63 


2.0 0.63 0.09 
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9 10 y; 


10 
Figure 9.9 Expectation vectors of the function pact for x = 1 and x = 2 and four 


observations. 


tangent plane at 0. The coordinate system on the expectation surface is not 
uniform, that is, as 03 4 0, from 


04-03 =02-0;, 0;€2 (i=1,2,3,4) 
not necessarily follows 


\|"7(@4) -n(3)|| = |ln(92) -0(@1)| - 


In this chapter, we assume that R(9) has for all Y a unique minimum. 
The expectation surface is independent of the parametrisation of the function 
defined below. 


Definition 9.4 A regression function f(x, 0) is reparametrised, if a one-to-one 
transformation g: QQ" is applied on @ and f(x, 0) is written as 


f (#9) = w(x, g(0)] =w(x,6") (9.25) 
with 0* = g(0) [0 = ¢ '(6")). 
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Example 9.4 The function 


a \ T 
St («,0) = T+ per with 0= (a, BY) ,B> 0 (9.26) 
is called logistic function. We assume afy # 0. The first and second derivatives 
by x are 
df (x,0) | apye™ 
dx (1+ fer)? 
and 
2 : 1 yxy2 Y* _ Je (1 yx yx 1—fel* 
PAO) __ ggy (+ Bete —2eP(L+ Bel \Bre™ ag age I—Bel 
dx (1 + Ber) (1 + fet*) 


Let the parameter vector lie in a subspace {2 of 2 where f(x, @) has an inflection 
point (x,,, 7). In this point the numerator of the second derivative has to be zero 
so that 


1 
1-fe'" =0; B=e and x, = -— Inf 
Y 
and 
a 
So = 2 =No’ 


ce OF GB) sea 
At x,, the second derivative “Te changes its sign and we really get an inflec- 
x 


tion point. 
Because # =e", we get 


el*o e Y*eY*o -1 
f(%0)=—"—_ =a -${1+ } 


1+fer = eo +e 2 e7eY%o +1 


= 5 {1+ tanh[-3(x-a.)| } =a{1+ tanh[b(x-c)]} =w (a, 9") 
(9.27) 
with 


a y 1 
6° =(a,b,c), a=—, b= , C= Inp- 
(abe), a=5, b=-}, e=—_ Inp 
Both versions of function can also be written as a three-parametric hyperbolic 
tangent function. The parameter c is the x-coordinate of the inflection point and 
a/2 the y-coordinate of the inflection point. 
Both versions of f(x, 0) in (9.27) have, of course, the same expectation curve. 
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The function of Example 9.3 written as hyperbolic tangent function is 


(08) =5{14 tanh 5 \, 


and the vector of the coordinates of the expectation curve is 


n(@) = {5(1» tanh|-(1n2+6)| ),5( 14 tanh] -(1n2 +26) ) \" 


If as the result of parametrisation a new parameter depends non-linear on the 
original parameter, the curvature changes as is shown below. 


Example 9.5 The function 


f(x,0) = (a+ fet*)® (0 = (a,B,7,8)", ap <0, 7 >0,5> 0) (9.28) 


was used in Richards (1959) to model the growth of plants under the restrictions 
in (9.28) of the parameter space. The function (9.28) is called Richard’s function. 
We rewrite this function also as 


-1/B 
w(ur)=A{t +B expl(D-a)) Gey" ; 


The connection between 0 and 6* = (A,B,C, D)" is given by the relationship 
(0=¢ '(6")): 
C 


a=A-8, p=BA* exp Goes ye) 


1 
Y= Ses, 6=-= 


i} 1° 1, (-£5 
A=a@, B=-=, C=-ya°/{1 . Paseo 
a 5 C ra ( 3) sin (=) 


The parameters A, C and D can be interpreted as follows: 


A: Final value (A = lim w(x, 6"). 
D: x-coordinate of the inflection point of the curve of y(x, 0°). 


dy(x,0*) 


C: y-coordinate of —— at x = D (maximal growth). 


This can be used to determine initial values for the iterative calculation of LS 
estimates. 
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For the case p = 1 Hougaard (1982) showed that a parameter transformation g 
exists, leading to a parameter a = g(0), so that the asymptotic variance of its esti- 
mator is nearly independent of 0. Asymptotic skewness and asymptotic bias of & 
are zero, and the likelihood function of & is approximately that of a normal dis- 
tribution, if the error terms e in (9.2) are normally distributed (this parametrisa- 
tion has further parameter-dependent curvature; see Section 9.2.2). 

This transformation is given by 


_ =cVF'F (9.29) 


with an arbitrary constant c. 
Generalisations of this result for p>1 are given in Holland (1973) and 
Hougaard (1984). 


Theorem 9.3 Let 7 = (9) in (9.2) in Q three times continuously (concerning 
0) differentiable and Q be connected. Then a covariance-stabilising transforma- 


tion g = g(, ...,9,) is a solution of 


dg (dag ag\ ra -ifdg ag\* 
00:00; = ($ ge) (F F) 00," 00, ki 
with k, and F defined in (9.4). 


Hougaard (1984) could show that for functions of type f(x;, 0) = 01 + O2h(x; 83) 
such a parameter transformation exists. 


9.2.2 Curvature Measures 


As shown in Example 9.4, the same function can by reparametrisation be writ- 
ten in several forms. 


Definition 9.5 Given two continuous differentiable functions f(x, 6) and 
h(x,5), @€ Q,0€A and let g(9) be a one-to-one mapping of Q on A. For all 
xER let fix, g(6)) = h(x, 5). Then we call h(x,5) a reparametrisation of f(x, 0) 
(and vice versa). 


Are there different properties of the estimators in a non-linear regressions 
model if different reparametrisation are used? Does a reparametrisation exist 
leading to a smaller bias as for the original parametrisation? In general there 
are questions about the influence of reparametrisation on the curvature. To 
answer such questions, we first need a definition of curvature. 

The second derivative of the regression function (concerning the parameter 
vector) often defines curvature measures. Such a locally measure (depending on 
the parameter) must suitably be globalised, for instance, by a supremum (see 
Beale, 1960; Bates and Watts, 1988). 
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Morton (1987) proposed a statistically motivated curvature measures based 
on higher moments of a symmetrical error distribution. We follow Morton’s 
approach. 

In model equation (9.2) we assume now that the error terms e; are identically 
and independently distributed (ii.d.) with expectation 0 and positive finite var- 
iance o°. Further we assume that this distribution is symmetric. We write the LS 


Se Pi pe 
estimator 0 = (41,...8p) in dependency of the error terms as 


By 
uj = 5 {8)(¢)-0,(-e)}., (9.30) 
= 5 {0,(e) +8,(-e)}-0, (9.31) 
we receive 


A 


0; = 9; + Uj + Vj 
From these assumptions, it follows E |{6(e)}} =E |{8(-2) }| and from this 
E(u,) = 0. Then the bias of the jth component of the LS estimator is 

b -E (6-6) =E(v)- 


Now curvature measures for the components of @ following Morton (1987) can 
be defined. 


Definition 9.6 A measure for the curvature (non-linearity) of the jth compo- 
nent 6; (j= 1, ...,p) of @ in (9.3) with the symbols in (9.30) and (9.31) is given by 


_var(v;) _ var(v;) 
- var (6) ~ var(uj)+ var(vi) 


We define by linear regression of v; on all products u,u; for each j a 


(p xp)-matrix C; so that cov(uju;, V2;) = 0 for each pair (k, /) if 


1 
Vai = Vji-Vij and Vij = au Cu 
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with u = (mj, ..., Uy)’. The special choice of C; has the advantage that the two 
components v, and v, of v are uncorrelated and 


var(v;) = var (vj) + var (v2;) 


follows. 

In the following we decompose the curvature measure in Definition 9.6 into 
two parts. The first part becomes small by a suitable reparametrisation 
(theoretically 0). 


Definition 9.7. We call 
var (v1; 
ec) 
var (4,) 


reparametrisation-dependent curvature (non-linearity) of the component 6; of 
6 and 


(9.32) 


Nye (9.33) 


intrinsically curvature (non-linearity) of the component 6; of 6. 
It is easy to see that Nj = Nj; + No;. 
Morton (1987) made a proposal for finding a suitable reparametrisation. With 
the matrix F in (9.4), we write 
1 ny 
=F" (0)F(@) =1n(0) = (mj) and 1,1(0) = (m'/)- 
n 
Let L be defined by L71,,(O)L = I,. With the symbols in (9.4) we define 


luv; = D Kuh (xi, 9), luvjl = D furl A)kji(xis 0), 


D; = (duvj) with duyj = Som!" tay 


and 


var ( u?) var (vj) 
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Morton showed that the first-order approximation below pertains to 


o T 2 
Nyx 5 tr{ (L™DL)’S, 


T 
Definition 9.8 0° = (9%,.-.9%) is called with 


Gt = 4 (j= 1,..p) 


2k var (uj)— var(v1j)— var(v2,) } 


Nj; — optimal reparametrisation of f(9) in (9.3) for all Nj. 


9.3 Asymptotic Properties and the Bias 
of LS Estimators 


The properties of LS estimators differ strongly between intrinsically non-linear 

and linear (including quasilinear) regression. In the intrinsically non-linear 

regression, we know nearly nothing about the distribution of 0 =, s? and 6”. 
The magnitude 


1 alee 
a8 (4 -p)= aon 


is not chi-squared-distributed, even if the error terms are normally distributed. 
Also the bias of 


v,(0) =E , -6| (9.34) 


is only approximatively known. Nevertheless, in the next section we propose 
confidence estimators and tests, which hold pregiven risks approximately, 
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and the larger the sample size, the better. We show this in this section by pre- 
senting the important results of Jennrich (1969) and Johansen (1984) and con- 
cerning the bias of the estimators’ results from Box (1971). 

We assume that the parameter space QC R? is compact and f(x, 0) is twice 
continuously differentiable concerning 0. 

At first we introduce Jennrich’s tail products that simplify the presenta- 
tion below. 

For finite 1 the set of measurement points (x, ...,.x,,) (the support of a discrete 
design) can formally be considered as a discrete probability measure with a 
distribution function F,,(x) (even if here is no random variable x). If 1 tends 
to infinity (co), then F,,(x) tends against a limiting distribution function F(x). 
Then for some restricted continuous functions s and ¢ with s,t:R®Q-R 
and (0, 0) € 2 ® Q we define 


[(s.8)e(2.0")aR() = (s(0), t(0")). (9.35) 


x 


Definition 9.9 We say the sequence {g;} (i = 1, 2, ...) of functions g;:R @ QR 
has a tail product (g, g) as in (9.35), if 


1 n 
"gi (9)gi(8"), 0,0" €Q, 
p=1 


tends for noo uniformly in (0, 6") € Q x Q against (g, g). If {g;} and {4}} are two 
sequences g;: R @ QR, h;; R ® QR, we say that these sequences have a tailed 
cross product (g, h), if 


1 n 
A) hi(O"), 0,0° €Q, 
i=1 


converges for all (0, 6") € Q @ Q uniformly against (g, /). 

From the continuity of all g; and h; and the uniform convergence, the conti- 
nuity of (g,g) and (g, 4) follows. 

To understand better the following theorems, we need an extended definition 
of an almost sure convergence for the case of a sequence of random functions 
depending on a parameter 0. 

In the non-stochastic case, uniform convergence (for all 6 € Q) is defined by 
the demand that for a function sequence {f,(9)} the quantity 


sup |fi(9) —f (9)| 
dcQ 
for i—oo tends against zero. 


For random functions we extend this by 
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Definition 9.10 Iff(0) and f(A) (i = 1, 2, ...) are random functions for 8 € 2 C R® 
and if ({Y}, Q, P) is the common probability space of the arguments of fand all f;, 
then f; converge uniformly almost sure in Q against f if all 


sup |f;(0)-f(6)| for i=1,2,... 
0cQ 


are random variables except for a set of P-measure 0 for all elements Y € {Y} 
(i.e. for all Ye {Y}\N with P-null set N) and for all ¢ > 0 there exists a 70(Y, €) 
so that for i= 10(Y,¢) 


sup |f,(0,Y)-f(6,Y)|<e. 
0€Q 
The proof of Theorem 9.4 is based on 


Lemma 9.1 (Borel—Cantelli) 
If y and yj, y2,... are random variables with a common probability space 
({Y}, Q, P) and if for all e>0 


S P{ly:-yl} >} < 00, 
the sequence {y,} converges almost sure against y. 


The proof can, for instance, be found in Bauer (2002, p. 73) or in Feller (1961). 
The proof of the lemma below can be found in Jennrich (1969, p. 637). 


Lemma 9.2. Let R= R(Y, 6) bea real-valued function on R” x Q and Qacom- 
pact subset of R’ and let R(Y, @) for all 6 € Q be a (continuous in @) measurable 


function of Y for all Y € {Y}, then a measurable mapping 0 of {Y} in Q exists so 
that for all Ye {Y} 


Y,0(Y)| = infR(Y,6). 

R[Y,O(Y)] = infR(Y,0) 

From this lemma it follows that LS estimators really are random variables. 
Theorem 9.4 Let g;: QR be continuous mappings of the parameter space 2 


in R; the sequence {g;} may have a tail product. If {u,} is a sequence of independ- 
ent N(0, o”)-distributed random variables, then 


1 n 
n= i2i a] ial eee 
zn= mel) (= 120) 


converges almost certain uniformly in Q against 0. 


We write the LS estimator introduced in Definition 9.2 now as 0 =6,, (it is 
under normality assumption a MLS), and with R(@) in (9.5) we write 


6 = “R(bn). 
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Theorem 9.5 (Jennrich) 
If with the notations of (9.2) to (9.5) e; are pairwise independent and identical 
with E(e;) = 0 and var(e;) = 6” normally distributed and Q is compact and if 


(fi) = (fi()) = (F (%i,8)) 
has a tail product and 
S(O") =|(F().£(0))/?, (0,0") €Q@Q 
has a unique minimum at 6 = , then 6, converges uniformly in 6 almost cer- 


+ . ~ + . : : 2. 
tain against 0 and 6” converges uniformly in 6 almost certain against o°. 


For testing hypotheses and for confidence estimations, Theorem 9.6 is 
important. 


Theorem 9.6 (Jennrich) 

Let the assumption of Theorem 9.5 be fulfilled and fY,@) be twice 
continuously differentiable concerning @. Let the function sequences {f{y;,, 6)}, 
fv, 9} G=1,....p) and thy, 4)} Gl=1, ....p) in (9.4) have tail products 
and tailed cross products, respectively. Let 


1(6) = lim 1ST FT(O)F(0) (9.36) 


be non-singular. 
Then for each @ from the inward of Q 


vn (0,-8) (9.37) 
is asymptotically N(0,, o°I-1(0))-distributed. 


The proof is given in Jennrich (1969, p. 639). 
We formulate the message of Theorem 9.6 so that 6,, is asymptotically 
N(O, >>)-distributed with >> = lim n-var, (A) and 


-1 
vara(0) =07[F7(0)F(6)] =o? oa) (9.38) 
i=1 
We call var,(9) the asymptotic covariance matrix of 6, and 
vara (6,) = Ren) [ET (On)E(On)]=s2(E7(On)E(@n)] (9.39) 
the estimated asymptotic covariance matrix of 6,,. Moreover, 
ee (6, (9.40) 
n-p 


is an estimator of o” and asymptotic equivalent with 62. 
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ees results (see Rasch and Schimke, 1983) show that s? has a smaller 
bias than 6;,. In the paper of Malinvaud (1970) it was shown (independent of 
Jennrich) that 6,, is a consistent estimator concerning 0. 

Generalisations of the results of Jennrich (mainly with general error distribu- 
tions) are given in Wu (1981) and in Ivanov and Zwanzig (1983). 

Next we will discuss the bias v,,(0) of 0. 


Theorem 9.7 (Box, 1971) 
With the assumptions of Theorem 9.6 and if 


A=0,-0 


approximately (first order) (e = Y — 4(@) with 7() from (9.4)) with certain matri- 
ces Ap.n Bau Be), has the form 


T 
A=Apyne+ (7 BD) 2.218" BY)e} ; (9.41) 


then we get with the notations in (9.4) approximately 
v4(0) = yvara( OY FO perf [ [F7( (O)F(0)|'Ki(0)}. (9.42) 


The proof can be found in Box (1971), where it is further shown how the 


matrices Ann Bors BY), suitably can be chosen. 


Close relations exist between (9.42) and the curvature measures (see Morton, 
1987). 


9.4 Confidence Estimations and Tests 


Confidence estimations and tests for the parameters of intrinsically non-linear 
regression functions or even for regression functions cannot so easily be con- 
structed as in the linear case. The reason is that the estimators of @ and of func- 
tions of 9 cannot be explicitly written down in closed form and their distribution 
is unknown. 


9.4.1 Introduction 


Special intrinsically non-linear regression functions in the applications are not 
written in the form f(x, 6) as in Definition 9.1 and the theoretical first part of this 
chapter. If there are only few (two, three or four) parameters as in Section 9.5, we 
often write 0, =a;0=[;03=7;04,=6. 
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Therefore the first kind risk from now on in this chapter is denoted by a* and 
consequently we have (1 - a‘) — confidence intervals. Analogous the second 
kind risk is 6”. 

Regarding properties of (1 -—.a”*)-confidence intervals and a*-tests, nearly 
nothing is known, we are glad if we can construct them in some cases. We 
restrict ourselves at first on the construction of confidence estimations K(Y) 
concerning 6 and define a test of Hp: 0 = Op as 

K(Y)= a if 0)€K(Y) 
0, otherwise. 
Concerning confidence estimators for 7(0), we refer to Maritz (1962). 


Williams (1962) developed a method for construction of confidence intervals 
for the parameter y in non-linear functions of the type 


F (#0) =a Ag(o7),0 = (aB,7)'s (9.43) 


with a real-valued function g(x, 7), which is twice continuously differentiable for 
y. Halperin (1963) generalised this method in such a way that confidence inter- 
vals for all components of 6 can be constructed. 

We consider the vector 


Uf (1,9) sf (%n,9)]" =BA (9.44) 
with 

= (Aodp-rQpnQ,) €Q=ABM, 

P= (Py) EA, A= (Aterdp-r) EL 


and p <n. The p< n(n x (p - r))-matrix B contains the elements bx; g). The 
bx; p) do not depend on 4 and concerning ¢ are twice continuously differen- 
tiable. The matrix B for gy £0, has the rank r. 

We start with the model 


Y=Bi+e, e~N(0,0°l,). (9.45) 


with 87 (47,07) = (07,07) and a (x x r)-matrix D, so that(B, D) is of rank p and 
(9.45) can be written as 


Y=(B,D)f +e. (9.46) 
By Theorem 8.1 we obtain LS estimates of 0; and 0, from 
6, = (B™B) 'BTY-(B™B) '(BTB)(uUTU) 'UTY 


and 
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respectively, where 
uT =D? (1,-B(BB) 'B"), 


as the solutions 


. (BB BTD\' (Br 
p= {18 
pD'B DID pt 


of the simultaneous normal equations. These estimates depend on 9. 


It follows from Theorem 8.2 that 0; and 6, are BLUE (because we assumed 
normal distribution even LVUE) concerning 6; and 0, if 2 is known. From 
Theorem 8.6 it follows that 


np (B-8) (8:0)"(B,D)(B-P) 


n x 9.47 
yp YTY-£"(B,D)"(B,D)B a 
is F(p, n — p)-distributed and 
_n-p 07uTYO, (0.48) 


r YTY-£"(B,D)'(B,D)p 
is F(r, n — p)-distributed. 
With F, confidence regions concerning @ and with Fy concerning g can be 
constructed, due to 


Theorem 9.8 The set of all 6 € Q of model (9.45) with 
F, <F(p,n-p|1-a’| ) (9.49) 
defines a (1 —- a*) confidence region concerning 0, and the set of all g € I’ with 
F,<F(r,n-p|1-a’|) (9.50) 
defines a (1 - a*)-confidence region concerning ¢ if D is independent of 4. 
Williams (1962) and Halperin (1963) proposed to choose D in such a way that 


F, disappears, if y = @, that is, p equals its LS estimate. From (9.7) we see that @ is 
solution of 
oBT 
AT —_(Y-Ba) =0 (j=1,....r), 
09; (9.51) 
e/(B? Y -B* BA) =0. 
With the additional assumption that in each column of B exactly one compo- 
nent of g occurs so that 


dbx (x) 
OQ; 
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differs from zero for exactly one k* = k(j), it follows from (9.51) 
OB? 
——(Y-BA)=0, 1;(B’Y-B"Ba) =0, 
09; 


and we choose for D 
Pot Ob; (x),.9 
d= (dy) = (Sem eee) ; ) (9.52) 


In (9.52) each sum has exactly one summand different from zero. The calcu- 
lation of confidence regions as described above is laborious as shown in 


Example 9.6 (Williams, 1962). Let 
FT (%,0) =a+ Ppe™. 
With 


We BE 1 1 1 T 
al em ef et and: S(GH0) 


the model has the form (9.46) (p = 3,r= 1,” >3). Because 
a 


0 vy, 
di = dj =—1+ —e™ =xje™, 


oy oy 


we get 


T no yew T 2yx; 
B B= > 9) > oy |? B’D= (Sze" > “me i \, 
eli Stee 
D'D= Ses 


(BB) -1_ 1 ( ve . 


OPS Nae on 
The elements uv of U are 


ett Sei — (xy + xp) rel + xyxy 
nye - (Demy 
The corresponding quantities are now inserted in Fy and the values y,(1 — a“) as 


lower and y,(1 — a’) as upper bound of a realised (1 - a*)-confidence interval, 
respectively, can now be calculated iteratively. 


Uk = x} 
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9.4.2 Tests and Confidence Estimations Based on the Asymptotic 
Covariance Matrix 


For practical applications the solutions given above are strongly restricted and 
awkward. A way out could be to use the asymptotic covariance matrix (9.38) or 
the estimated asymptotic covariance matrix (9.39) and in analogy to the linear 
case to construct simple tests and confidence estimations with the quantiles of 
the central ¢-distribution or with the normal distribution. This was done by Bliss 
and James (1968) for hyperbolic models. 

It is not clear whether these tests really are a*-tests and the confidence inter- 
vals are (1 -a*)-confidence intervals and what the power of those tests is. 
Because there is no theoretical solution, such questions can only be answered 
by simulation experiments. We here demonstrate the method and in 
Section 9.4.3 the verification by simulation experiments. 

In Section 9.6 we present results of simulation experiments for special func- 
tions. Heuristically tests and confidence estimations based on the asymptotic 
covariance matrix var4(9) of the LS estimator @ can be introduced as follows: 

In 


var4(9) = (0° vy) (jk =1,-4D)s 
let us replace 0 by its LS estimator 6 and estimate o* by 
(6) 
gat 
n-p 


with R(@) in (9.5). This gives us the estimated asymptotic covariance matrix in 
(9.39) now as 


var, (6) = (sj). (9.53) 


To test for an arbitrary j (j = 1, ..., p) the null hypothesis Ho; : 6; = jo against H,;: 
0; # Ojo analogously to the linear case, we propose to use the test statistic 


_ 9;-9}o 


ae Jey (9.54) 
to define a test with a nominal risk of first kind ayo, 
: Qnom 
wed? if |t)| >t(n-pl 1-2") ae 


0, otherwise 


A confidence interval concerning the component 6; of @ is analogously 
defined as 


Qnom 


5 ):8;+8 ¥jt(n—p|1-—") |. (9.56) 


6,-s vt (n-p| 1- 
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Schmidt (1979) proposed to use in place of (9.54) a z-test statistic 
6,-0, 
ge 
0. /Vij 
But the corresponding test is often non-recommendable if 1 < 20 but just these 
cases are often of interest. 


(9.57) 


9.4.3 Simulation Experiments to Check Asymptotic Tests 
and Confidence Estimations 


In mathematics, if we cannot obtain results in an analytic way, we are in the 
same situation as scientists in empirical sciences. The most important means 
of knowledge acquisition in empirical sciences is the experiment (a trial). To 
get from experiments statements with pregiven precision, an experiment has 
to be planned. Experiments in statistics are often based on simulated samples; 
we could speak about empirical mathematics. How important such an 
approach has become in the meantime can be seen from the fact that in 
2016 the Eighth International Workshop on Simulation was held in Vienna, 
the series of workshops started in May 1994 in St. Petersburg (Russia) (see also 
Chapter 1). 

Most information below is based on research project by more than 20 statis- 
ticians during the years 1980-1990. A summary of the robust results is given in 
Rasch and Guiard (2004). 

The number of samples (simulations), also called runs, has — in the same way 
as in real experiments — to be derived in dependency on precision requirements 
demanded in advance. 

If it is the aim of a simulation experiment to determine the risk of the first kind 
a* of tests or the confidence level 1 —- a* of a confidence estimation based on 
asymptotic distributions, we fix a nominal value Qjo,,, called nominal a* in 
the t-quantile of the tests or the confidence estimator. By simulating the situ- 
ation of the null hypothesis repeatedly, we count the relative frequency of reject- 
ing the null hypothesis (wrongly of course) and call it the actual risk of the first 
kind agcz We remember the Definition 3.8 of ‘robustness’. 

If a probability a,.; shall be estimated by a confidence interval so that the 
half expected width of that interval is not above 0.005, then we need about 
N= 10000 runs, if a,,; = 0.05, Simulation experiments in Section 9.6 are there- 
fore based on 10 000 runs. In the parameter space Q, we define a subspace of 
practically relevant parameter values 0"? (r= 1, ..., R), and for each ra simula- 
tion experiment with N= 10000 runs was made. (Components of @ with no 
influence on the method have been fixed at some arbitrary value.) If a method 
is 100(1 — €)% robust (in the sense of Definition 3.8) for the R extreme point, 
then we argue that this is also the case in the practically relevant part of 2. 
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T 
Let 6* be an arbitrary of these vectors 0” with 6* = (0;,..05) . Due to the 


connections between confidence estimations and tests, we restrict ourselves 
to tests in the following (Chapter 3). 

The hypothesis Ho: 6; = 6; = 69 has to be tested against Haj :0; # Ojo with 
the test statistic (9.54). For each of the 10 000 runs, we use the same sample 
size n > p+] for each test and add to the function f(x; 6")(i=1, ...,”) at 1 in 
[x,, x9] fixed support points x; pseudorandom numbers e; from a distribution 
with expectation 0 and variance o”. Then for each i 


Vi =f (xi,0) +e; (C= 1,..., 15%; € [Xp %u]) 
is a simulated observation. We calculate then from the 1 simulated observation 
the LS estimate 6 and the estimate s” of o” and the test statistic (9.54). We obtain 
10 000 estimations @ands? calculate the empirical means, variances, covar- 


iances, skewness and kurtosis of the components of @ and of s?.Then we count 
how often for a test statistic ¢; from (9.54) 


Qnom Qnom Qnom 
t<-t(n-1]1 i: e( 11 )sase( Sie ) 
y < -t(n-1]1- );-e(n 1) 1-“") << e(n-1/1- 


and 


>t(n-1| 1-"5") (j= 1,..., 10 000) 


occurred (the null hypothesis was always correct), divided by 10 000, giving an 
estimate of a,,;. Further 10 000 runs to test Ho : 0; = 6; = A; with three A; values 


have been performed to get information about the power. The most simulation 
experiment used besides normally distributed e; also error terms e; with the fol- 
lowing pairs of skewness y; and kurtosis 72 


N1 0 1 0 15 0 2 
Y2 15 15 3.75 3.75 7 7 


to investigate the robustness of statistical methods against non-normality. For 
the generation of pseudorandom numbers with these moments, we used the fol- 
lowing distribution system. 


Definition 9.11 A distribution belongs to the Fleishman system (Fleishman, 
1978) Hig its first four moments exist and if it is the distribution of the transform 


y=a+tbe+cx’ +dx? 
where x is a standard normal random variable (with mean 0 and variance 1). 
By a proper choice of the coefficient a, b, c and d, the random variable y will 


have any quadruple of first four moments (, 6”, 71, 72). For instance, any normal 
distribution (ie. any element of G) with mean p and variance o° can be 
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represented as a member of the Fleishman system by choosing a = pu, b = o and 
c = d = 0. This shows that we really have H D G as demanded in Definition 3.8. 

Nowadays we have convenient computer packages for simulation. A package 
described in Yanagida (2017) demonstrates simulation tools and the package 
Yanagida (2016) allows the generation of any member of the Fleishman system 
H. More information about statistics and simulation can be found in Rasch and 
Melas (2017). 

Results of simulation experiments for several regression functions are given in 
Section 9.6. 


9.5 Optimal Experimental Design 


We reflect on the definition of experimental design problems in Section 1.5 but 
do not consider cost. If we use a quadratic loss function based on R(@) in (9.5) 
and as the statistical approach point estimators concerning 0 with the LS esti- 
mator 6, in Definition 9.2, the choice of a suitable risk function is the next step. 
A good overview about the choice of risk functions is given in Melas (2008). 

A functional of the covariance matrix of @ cannot be used because it is 
unknown. We may choose a risk function based on the asymptotic covariance 
matrix var,(9) in (9.39) or on the approximate covariance matrix of Clarke 
(1980) or on an asymptotic covariance matrix derived from asymptotic expan- 
sions of higher order (see Pazman, 1985). 

We use the first possibility for which already many results can be found. First 
we consider the optimal choice of the support points for a given number x of 
measurements and give later hints to the minimal choice of 1 in such a way that 
the value of the risk function is just below a given bound. A disadvantage of the 
experimental design in intrinsically non-linear regression is the fact that the 
optimal design depends on at least one value of the unknown parameter vector. 
For practical purposes we proceed as follows. We use a priori knowledge 6 
about @ defining a region U(>) where the parameters @ is conjectured. Then 
we determine the optimal design at that value 6 € U(6o), leading to the maximal 
risk of the optimal designs in U(A9). The size of the experiment x at this place 
gives an upper bound for the risk in LU(99), because the position of the support 
points often only slightly depends on 6 (see Rasch, 1993). This must be checked 
for each special function separately as done in Section 9.6. 


Definition 9.12 A scheme 


Kip Xyi ; m 
Vie 1x; € [x1,xy], 1; integer, S nj=n 
N1,++yNm i=l 


is called a concrete m-point design or a m-point design for short with the 


support S,, = (x1, ....%) and the allocation N,,, = (11,2, ...; 1m). The interval 
[xp Xo] is called the experimental region. 
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If a special regression function 
FS (%,0),x € [x1 %y],0 €.Q C R? 
is given, then V,, ,, is element of all possible concrete designs: 
m 
Vy = {Ys =Vimipsmsn, card(S,,) =m, Sinan, nso}, 
j=l 


If Z:V,—R* is a mapping Z,(V,,) = Z[var,(9| V,,)] with @) €2,V,€ Vn, 
Z : R?*? — R' and the asymptotic covariance matrix (9.38) can now be written 
in dependency on 0 and Vy, in 


var, (0) = vara(9| V;,); 


then V7 ,,, is called a locally Z-optimal m-point design at @ = Oo, if 


Za( Vim) = inf Zo(Vim)- (9.58) 


VizmEVn 


If Vy,m is the set of concrete m-point designs, then Vii is called concrete locally 
Z-optimal m-point design, if 


Zo( Vii m) = | inf {Zo(Vinm)}- (9.59) 


n,m n 


The mapping V,,—var,(@o| V,,) is symmetric concerning S,,. Therefore 
we focus on supports with x, <%2< +--+ <%,, In place of minimising the (p x p)- 
matrix Z[var4(9, Vy, m)] =, we can maximise its inverse, the so-called asymptotic 
information matrix MM’. 

Especially V;',,, for r = 1,..., p + 2 with the functionals Z, and the (p x p)- 
Matrix M = (m;,;) is called 


Z,(M) =m, (r=1,...,p) locally Co,-optimal, 


Zy+1(M) = |M | locally D-optimal, 
Zp+2(M) = Sp(M) locally A-optimal, 
and in general for r=1, ...,p +2 then Z,.- optimal. 


For some regression functions and optimality criteria, analytical solutions in 
closed form of the problems could be found. Otherwise search methods must be 
applied. 


The first analytical solution can be found in Box and Lucas (1959) as well as in 
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Theorem 9.9 (Box and Lucas, 1959). 
For the regression model 
f(%,0) =a + Be™ (9.60) 


with n = 3, 6 = (a, B,y)’ and x € [x, x,], the locally D-optimal concrete design V3 
depends only on the component Yo of 99 = (ao; Bo; Yo)” and has the form 


Vee Xy X*2 Xo 
led 1 


1 x, efor — x% efo%o 
x=—+— — (9.61) 
Yo e/o*u — eYo*0 


with 


Atkinson and Hunter (1968) gave sufficient and for n = kp sufficient and neces- 
sary conditions for the function f(x, 9) that the support of a locally D-optimal 
design of size n is p = dim(Q). These conditions are difficult to verify for p > 2. 


Theorem 9.10 (Rasch, 1990). 

The support of a concrete locally D-optimal p-point design of size n is 
independent of u; the n; of this design are as equal as possible (i.e. if 1 = ap, then 
n; = a; otherwise n; differ maximal by 1). 


Proof: If m=p, the asymptotic covariance matrix in (9.38) is (after dropping the design 
independent factor o”) equal to Svar, (0) = [F7 (@)F (0) “*=G7(0) NG(6) 
with G(0) = {f(x, 0)},i=1,...,p and N= diag(n,....2,). Now minimising 
|[F7(0)F(6)|""| means maximising |G"(0)NG(0)|. For quadratic matrices A 
and B of the same order, |AB| = |A||B| and |A| = |A”| always hold; therefore 
we obtain |G? (0)NG(6)| = |G7?|N |. This completes the proof, because |G’| 
can be maximised independently of N and |N|=|]?_,7; is a maximum if 1; 


are equal or as equal as possible. 
In Rasch (1990) further theorems concerning the D-optimality can be found. 


Theorem 9.11 Let f(x, 0) be an intrinsically non-linear regression function, 
x €R,0€QCR, with the non-linearity parameter ¢ = (6,,,...,0;,),0<1r<p 


in Definition 9.1 and let F be non-singular. Then the concrete D-optimal design 


of size n= p only depends on g = (Bis y+++10),)" 50 <r<p, and not on the linearity 
parameters. 


Proof: In Definition 9.1, 


TO) _ c@es.0) 
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with g(x, ~) = (gi (x, ); vey SplX; ()). If we put G= (g (xi, ~)), then 

|FTE| =|C(@)G7GC™(6)| = |C(@)||C7 (6) ||G7G| = |C(®)|"|G7GI. 
|E7F| is maximal, if |G’G| is maximal, and G only depends 
on y= (6;,,...,0;,)' ,0<r<p. 


If n > 2p, the D-optimal concrete designs are approximately G-optimal in the 
sense that the value of the G-criterion for the concrete D-optimal p-point design 
even for n # tp (t integer) does nearly not differ from that of the concrete 
G-optimal design. For the functions in (9.6), we found optimal designs by search 
methods, and for 1 > p + 2 we often found p-point designs. Searching D-optimal 
designs in the class of p-point designs, then var,(9) in (9.39) becomes 


var, (0) = [BT diag (n,...,ny)B| Te 


because 
Hig 005K, us 
Vn= fo? |, ETE = (Sf (x OVfelm 0) | ik = LP) 
Nj, .+yNp i=l 
and 
n P 
S F(x: O)fi (is) = S— gf (x1, fe (1,0) 
i=l i=l 
with 


B=(5650)) ja cep): 

Theorem 9.12 The minimal experimental size ,,;, so that for a given K > 0 
\vara(0)| < K|BI? 

can be determined as follows. Find the smallest positive integer z with 


1 
WK 


In the case of equality above 1,;, = pz. Otherwise calculate the largest integer r, 
so that 


Amin = pZ—1. 
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Proof: From the proof of Theorem 9.10 we know that 


Cc 


|var4()| = p>? 
2 
BP Lm 


for D-optimal p-point designs and the final proof is left to the reader. 


9.6 Special Regression Functions 


In this section we discuss some regression functions important in the applica- 
tions in biosciences as well as in engineering. Each of the special functions is 
discussed by a unique approach. We determine the asymptotic covariance 
matrix, determine the experimental size No for testing parameters and deter- 
mine parts of the parameter space, for which the actual risk of the first kind 
of a tests is between 0.04 and 0.06 if the nominal risk ayo, = 0.05. 


9.6.1 Exponential Regression 


The exponential regression is discussed extensively as a kind of pattern; the 
other functions follow the same scheme, but their treatment is shorter. 

Model (9.2) is called the model of the exponential regression, if f;(x, 0) is given 
by (9.60). The derivation of f;(x, @) concerning 0 (0 = (a, f,y)") is 


; 1 100\ /1 
0 
Hes Ve | em lel one | [ee |; (9.62) 
Pxe™ 00 fp xe 


so that y by Definition 9.1 is a non-linearity parameter. 


9.6.1.1 Point Estimator 
For R() in (9.5) we get 


RO) = S>(.-a-Pe™)? 


and determine @ so that 


R(6) = min R(@). 
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Because 


n n n 
A=) em; B=5 xe, cy ei, 
i=l i=1 i=1 
n n 
D= s xe", E= s tee, 
iz1 i=l 


it follows 

|F7F| =~ {n(CE-D”*) +2ABD-B’C-A’E} =f'A, 
which means that # + 0. For fixing Qo we either choose f > 0 or 6 < 0 depending 
on the practical problem. For growth processes, it follows because y < 0 imme- 
diately / < 0; then in this case we choose 

Qo. CR xR xR CQACR. 
The region 2 ) must be chosen so that the assumptions V2 and V3 in 


Section 9.1.1 are fulfilled. 
The inverse of F'F has the form 


(9.63) 


CE-D* BD-AE (AD-BC) 
(FTF) => BD-AE — nE-B? y(AD-nb) (9.64) 
(AD BC) ylaD nD) 7 (nC -A’) 


Next we describe a method of Verhagen (1960) that can be used to find initial 
values for calculating iteratively the LS estimators by the Gauss—Newton 
method. We start with the integrals 


I (xj) = | (a+ Be” )dt =axj+ —— - 


0 


p 
Y 


with n; =a + fe” (i= 1,...,2) and approximate them by 


i 


T; = T (xi) = 3 (9-1 +9j) (xj -%j-1) (i = 2,572). 
j=2 


Now we put 
n, ~yYTi-ayxj+a+P (i=2,...,n) 
and estimate the parameters of the approximate linear model 


¥,=7Ti-ayxit+ a+ Pte; (i=2,...,n) 
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with the methods of Chapter 8. The LS estimators are 


_ SP rySS8x—SP1.SP xy 


OS Wy $S7SS,— SP2, 
Pree e c/SP 7, — SP xy 
GSS 
and 
b, =f, =y-¢,T -a,(c,¥-1) 
with 


SP yy = You 2 = (>: «) > ") , SS, =SP,,, 


and the arithmetic means 7, 7’ and ¥ of the 1 — 1 values y,, T; and x; for i = 2, ..., 


n, respectively. 


9.6.1.2 Confidence Estimations and Tests 


Let the assumptions of Section 9.3 be given. The asymptotic covariance matrix 


var4(0)=0?(FTF) 


with (F7F)~? in (9.64) and the abbreviations (9.63) can be used for the construc- 
tion of confidence intervals for a, and y and tests of hypotheses about a, / and 


y may be used, respectively. 
Following Section 9.5 we test 


Hoa: a= against Hag: a#ao 


with the test statistic 
(a a a) VA 
sVCE-D 


fy = 


Further 


Hog: P=Po against Hag: PA Bo 
is tested with 


(9.65) 


(9.66) 
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and 
Hoy :y=Yo against Hay :y #Yo 
with 


(c-7o) VA 


oa (9.67) 
AY VV nC-A 


A,...,E in the formulae of the test statistics are gained from A, ... , E in (9.63) by 
replacing there the parameter y by its estimator y = c. Further 


A =n(CE-D*) +2ABD-A’E-B°C, 


and here also we obtain A from A by replacing y by c. Finally s is the square 
root of 


2 1 . 


=——- _-a-be™)’, 
a3 24 a ) 


S 


the estimator of o”. 
Tests have the form 


1, if |t;| >t(n—-3|1—-—ayom/2); l=a,f, 
uirr{ Ja] >#(-311 yom /2)i F=f 


0, otherwise 


with the (1-@yom/2)-quantile of the central t-distribution with n — 3 degrees of 
freedom. Here @’,,,,, is the nominal risk of the first kind of the tests. Confidence 
intervals with a nominal confidence coefficient 1—@9;, are defined as follows 
putting ¢(4-3|1-@yom/2) = T(N, Qnom) 

Parameter a: 


r Ries 30) Ries a) 
CE-D CE-D 
a-s———— T (n, Grom), 4 + S——=— T (1, Qnom) 
L Va VA 
Parameter /: 
= R ~~») zm >) 
-B E-B 
Ba ia bts "Tee 
L VA VA 
Parameter y: 
nC-A’ s 2 
Ss = ne 
c-t T (1, Qn0m),e+ —\nC-A AT (1, Onom) 
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Example 9.7 Let us consider a numerical example. Table 9.3 shows the 
growth of leaf surfaces of oil palms observed in Indonesia. 


Program Hint 

Most calculations and graphs in Section 9.6 have been done by our own special 
program Growth, which can be found in the program package CADEMO (see 
http://www.swmath.org/software/1144). The program determines initial values 
for the iteration to calculate the LS estimates from the data. In Figure 9.10 the LS 
estimates with the asymptotic confidence intervals and the estimated residual 
variance are given. In Figure 9.11 the curve of the estimated regression function 
together with the scatter plot of the observation is given. 


Table 9.3 Leaf surface (y) in m? of oil palms on a trial area in dependency of age x; in years. 


x, 1 2 3 4 5 6 7 8 9 10 11 12 
y;, 2.02 3.62 5.71 7.13 833 8.29 981 11.3 12.18 12.67 12.62 13.01 


[EA CADEMO - Growth [c:\dokume...\oilpalm.dat] es [=] 3) 


Ble Edt Options Dictionary Window Help 
Growth Functions Exponentisi(3) 


= — aldlx 


Decision: 
Estimation of the growth parameters 
with automatic calculation of the initial values 


Data File:  ¢:\dokume“i\dieter™1\desktop\cademau\eng\oilpalm dat 


AAs 


The estimates of the parameters for the growth function 
{Exponential (3)1 


F(x) *AeBea (Cex) 


according to the least squares method, ss well ss the confidence 
limits of the parameters and the estimation of the residual variance 


| 
k=] 
S| 
El 


s? for a given sample size of n= 12 are: 
Confidence Interval (a = 0.0500 > 
Parameter Estimate Lower Limit | Upper Limit | 
A 16.4750 13.3633 19.5646 
B -16.6501 19.0181 -14.2820 
c -0.1380 -0.1957 0.0802 
sts 0.2001. 


Figure 9.10 LS estimates, asymptotic confidence intervals and estimated residual variance of 
the exponential regression with data of Example 9.7. 
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f(x) f(x) =A + B*exp(C*x) 


8.00 
5.00 
2.00 — Expo(3) 
A 16.4790 
B -16.6501 
-1.00 C -0.1380 
—1.00 2.00 5.00 8.00 11.00 


x 


Figure 9.11 Curve of the estimated exponential regression function together with the 
scatter plot of the observations of Example 9.7. 


9.6.1.3. Results of Simulation Experiments 
For the exponential regression, we perform simulation experiments described 
in Section 9.4.3. The questions are as follows: 


e Is the bias of a, b and c important? 
e Differ the asymptotic variances from the empirical ones? 
e Is the denominator n — 3 of the estimator of o” appropriate? 


The results in Rasch et al. (2008) and Rasch and Schimke (1983) for equidistant 
x;€ [0,65],i=1, ...,m” and n=4,6,14 as well as 12 (f,7)-combinations are 
summarised below. W.l.o.g. we choose a =0 and further o* = 1. The number 
of runs was 5000. In each a, f and y have been estimated, and from the 5000 
estimates, the empirical means 4,b and ¢ and the empirical variances 
s,s; ands? and covariances have been calculated. 

Table 9.4 shows the empirical bias vz,,, representing @—a,b-f andé-y for n = 
4, 6 and 14 in comparison with the by (9.42) calculated approximative bias v,,(0). 

To calculate v,(@) by (9.42), we use the notations of (9.4) and the vector 
F,(@) = (1,e™, Buje™)* and the inverse (F/F)~! from (9.64). For K,(@) we get 


0 O 0 
K(@0)=|0 O xe 


0 xe" Px?e™ 


463 


464 | Mathematical Statistics 


Table 9.4 Empirical bias ve, ,, from 5000 simulated samples of size n and approximate bias v, 
from (9.42); the LS estimates of the parameters a, B and y of the exponential regression for 
n=4,6 and 10 and o?=1. 


a B Y 
-B -107y n VE,n Vn —-VE,n —V_ -1 OF ve, i -107v, 
30 3 4 0.520 0.523 0.252 0.526 0.419 0.251 
0.644 0.614 0.625 0.622 0.131 0.134 
14 0.263 0.238 0.287 0.248 0.003 0.055 
5 4 0.147 0.137 0.135 0.139 0.441 0.470 
0.125 0.102 0.142 0.107 0.096 0.215 
14 0.128 0.057 0.166 0.066 -0.170 0.084. 
7 4 0.055 0.070 0.059 0.071 1.139 0.990 
0.052 0.048 0.048 0.052 0.338 0.363 
14 0.027 0.026 0.035 0.035 0.120 0.129 
9 4 0.035 0.047 0.035 0.048 2.821 2.184 
0.002 0.031 0.019 0.033 0.685 0.610 
14 0.012 0.016 -0.025 0.025 0.320 0.189 
50 3 4 0.310 0.314. 0.323 0.316 0.117 0.091 
0.279 0.249 0.307 0.253 0.048 0.048 
14 0.210 0.143 0.190 0.149 0.058 0.020 
5 4 0.070 0.082 0.041 0.083 0.236 0.169 
0.077 0.061 0.090 0.064. 0.050 0.077 
14 0.025 0.034. 0.04.2 0.040 0.035 0.030 
7: 4 0.030 0.042 0.048 0.042 0.358 0.356 
0.045 0.029 0.033 0.031 0.011 0.131 
14 0.010 0.016 0.032 0.021 0.122 0.047 
9 4 0.023 0.028 0.039 0.029 0.888 0.786 
0.020 0.018 0.023 0.020 0.182 0.219 
14 0.001 0.010 0.021 0.015 0.077 0.068 
70 3 4 0.301 0.224. 0.297 0.225 0.015 0.045 
0.124. 0.178 0.125 0.181 0.054. 0.025 
14 0.021 0.102 0.173 0.106 0.106 0.010 
5 4 0.075 0.059 0.081 0.059 0.079 0.086 
0.054. 0.044. 0.154. 0.046 0.000 0.040 


14 0.022 0.025 0.061 0.028 0.044. 0.015 
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Table 9.4 (Continued) 


a B Y 
-B -1 07 n VE,n Vn -VE,n —Vn -1 07VE,n -10°v, 

7 4 0.027 0.029 0.054. 0.030 0.194 0.182 

0.020 0.021 0.014 0.022 0.043 0.067 

14 0.038 0.011 0.027 0.015 0.008 0.024 

9 4 0.020 0.020 0.018 0.021 0.462 0.401 

0.026 0.013 0.020 0.014 0.071 0.112 

14 0.029 0.007 0.038 0.011 0.060 0.035 


Adding to (9.63) the abbreviations 


n n 
G= y gerC HS S xpe7r 
i=l i=l 


because of o” = 1 and (9.64) we receive 


tr{ (FTF) “'K(0)} = a (2(AB-nD)xje” + (nC-A?)x?e”) 


and finally 
2B(AB-nD) + G(nC- A”) 
vy (0) aap (FTF) "| 2D(AB-nD)+E(nC-A2) |]. (9.68) 
2BE(AB- nD) + HB(nC - A?) 


We see in Tables 9.5 and 9.6 that the empirical variances do not differ strongly 
from the main diagonal elements of the asymptotic covariance matrix even for 
n=4, 

The choice of the denominator n — 3 in estimate s” of o” is analogous to the 
linear case. There 1 — 3 (or in general  — p) is the number degrees of freedom of 
the y?-distribution of the nominator of s”. If we compare expectation, variance, 
skewness and kurtosis of a y?-distribution with n — 3 degrees of freedom with 
the corresponding empirical values from the simulation experiment, we see that 
even for the smallest possible n = 4, a good accordance is found. This means that 
n—3 is a good choice for the denominator in the estimator of 07. 

Table 9.7 shows the relative frequencies of confidence estimations and tests 
with Qyom = 0.05 and Qyom=0.1 for a special parameter configuration from 
10 000 runs, respectively. As we can see already with n = 4, a sufficient alignment 
is found between @,,9,,, and @4-;. Therefore the tests in Section 9.6.1.2 can be used 
as approximative @,,-tests and the confidence intervals as approximative 
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Table 9.5 Empirical variances s2 ands} with the asymptotic variances 
var,(a) and var,(b) of the estimates of a and f(o? = 1) forn=4 andn=6. 


-107y n 10°s? 10°var,(a) 10°s? 10°var,(b) 
3 4 878224 800768 853658 780404 
6 680837 613678 611028 547540 
5 4 197339 187157 266298 260565 
6 130694 129982 178729 182512 
7 4 105017 98990 197639 191016 
6 64415 63300 145588 144533 
9 4 71968 73079 170001 170312 
6 44366 44152 137567 135566 


Table 9.6 Empirical variances (upper value) and asymptotic variances 
(lower value) of the estimate of y for n=4 multiplied by 10°, o? = 1. 


-107y B= -70 = -50 B= -30 
3 7751 15535 42.825 
7512 14723 40897 

5 10475 20416 57989 
10199 19989 55251 

7 19087 38258 117038 
18873 36992 102754 

9 39407 80225 281944 
39407 77238 214550 


(1 - Q@yom) confidence intervals. The power function of the tests was evaluated in 
Rasch and Schimke (1983) as well as the behaviour of the tests for non- 
equidistant supports. Summarising it can be stated that the methods based 
on the asymptotic covariance matrix are satisfactory already for 1 = 4 and about 
90% robust against non-normality in the Fleishman system. 


9.6.1.4 Experimental Designs 
To find locally D-optimal designs, we can use Theorem 9.9. During extensive 
searches of optimal designs, not only in the class of three-point designs optimal 


Table 9.7 Relative frequencies of 10 000 simulated samples for the 
(incorrect) rejection (left hand n, right hand n,) and for the (correct) 
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acception (ny) of Ho for the exponential regression with a=0, 
B= -50,y7 = —0.05,n=10(-1)4 and dnom = 0.05 and dnom = 0.1. 


Onom = 0.05 Qnom = 9.1 

n Ny No nu Ny No Nm 
Agog: a=0 
10 2.71 2.03 95.26 5.36 4.01 90.63 
9 3.17 2.07 94.76 6.23 4.44 89.33 
8 2.51 2.03 95.46 5.06 4.28 90.66 
7 2.59 2.03 95.38 5.24 4.54 90.22 
6 2.98 2.04. 94.98 5.52 4.26 90.22 
5 2.80 2.19 95.01 5.57 4.22 90.21 
4 2.66 2.41 94.93 5.12 4.96 89.92 
Hog: B= —50 
10 2.44. 2.31 95.25 4.97 4.48 90.55 
9 2.43 2.44 95.13 5.11 4.85 90.04. 
8 2.46 2.21 95.33 5.01 4.38 90.61 
7 2.74 2.01 95.25 5.26 4.46 90.28 
6 2.63 2.48 94.89 5.32 4.92 89.76 
5 2.37 2.49 95.14 4.87 5.03 90.10 
4 2.59 2.27 95.14 5.34 4.80 89.86 
Ao, :y = —0.05 
10 2.50 2.26 95.24 4.99 4.48 90.53 
9 2.76 2.52 94.72 5.72 4.90 89.38 
8 2.85 2.08 95.07 5.39 4.40 90.21 
7 2.79 1.82 95.39 5.26 4.33 90.41 
6 2.68 2.39 94.93 5.17 4.59 90.24. 
5 2.63 2.43 94.94. 4.80 4.97 90.23 
4 2.56 2.35 95.09 5.42 4.72 89.86 
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Table 9.8 Optimal experimental designs in the experimental region [1,12] and n = 12. 


Criterion (B, y) = (-17, -0.14) (B, 7) = (-19, -0.2) (B, 7) = (-14, -0.08) 


D 15.14 12 1 4.63 12 15.7 12 
4 4 4 4 4 


Ca 1 5.08 7 1 4.61 _ 


ae oe 
Step tery ony 
tot il cao Me 


1 5.02 12 1 4.48 12 
3 3 


designs have been found, which are three-point designs as derived by Box for 
n = 3 in Theorem 9.9. By search methods concerning the locally C,-,C,- and 
A-optimality, we found that the optimal designs always have been three-point 
designs in [x, x,] with x, = x, and x3 = x, For the C,-optimality often one of the 
bounds of the experimental region did not belong to the support of the locally 
Cy-optimal design, but they always have been three-point designs. In Table 9.8 
we report some results of our searches using the parameters and experimental 
regions of Example 9.7. 

We can now compare the criterion values of the D-optimal design 


4 4 
leche a 


1 5.14 12 . 
? (it was 0.00015381 o°) with the design used in the experiment 


Tiiliidididiiiliidi£ il 
0.0004782 6°, and this is 3.11 times larger than that of the optimal design. 
The criterion of the C,-optimal design is 0.00154 o” and that for the design used 
in the experiment is 0.00305 0”. 

It can generally be stated for all models and optimality criteria that an equi- 
distant design in the experimental region with one observation at each support 
point is far from being optimal. 


that had a criterion value 


9.6.2 The Bertalanffy Function 


The regression function fg(x) of the model 


y, = (a+ fe”) +e; =fa(x;) +e, i=1,...n, n>3, (9.69) 
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is called Bertalanffy function and was used by Bertalanffy (1929) to describe the 
growth of body weight of animals. This function has two inflection points if a 


1 
and # have different signs and are located at xy -7n(-5) and 


1 ea 


With 0 = (01, 05, 03)" = (a, B, 7)" and with the notation of Definition 9.1, we 
obtain 


3(a+ Be”) 
3(a+ per)? el! Bx 


and by this all components of @ are non-linearity parameters. Analogous to 
(9.63) we use abbreviations like 


ZX (a + pe), 


n n n 
A= Sz B= S ze™, C= Sy xe™, 
i=l i=l i=l 
n n n 
D= sae, E= Scame™, G= ee 
i=l i=l i=l 


Then 
A Be $C 
F'F=9| BED pE 
BC PE PG 

and 


FTF =9°f |ADG + 2BCE-C’D-E’A-B’G] =9° fA. 
The asymptotic covariance matrix is therefore 


var4(0) =07(ETF) 


1 
DG-E? EC-BG poe) 


3 1 (9.70) 
=| EC-BG AG-C? = —(BE-AE) 
9A B 
(BE CD) (BC AE) : (AD-B’) 
Bp B B 
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To determine the initial values, it is recommended to transform the y;- values of 
(xi, 1) (i = 1, tee n) to 

vi= Vi 
and estimate from (x;, v;) the parameters a, /, 7 of an exponential regression in 
Section 9.6.1. These estimates a”, b*, c* are used as initial values for the iterative 


determination of the LS estimates a,b,c of the Bertalanffy function. 
Concerning the hypothesis testing, we receive from (9.70) for 1 > 3 the test 


statistics 
for Hog: @ = A against Hyg:a4 ao 
2 (a—ao)3V A 
a 


sVDG-E 
for Hog: B = fo against Hyg: B A Po 


(b-f.)3VA 


ts = 
s\/AG-C 
for Hoy: 7 = Yo against Ho, :y # Yo 


(c-79)3bVA 


tL, = 
sVAD-B 
and the confidence intervals in Section 9.4. The symbols A,...,A are defined as 
in Section 9.6.1 and s is the square root of 


2 1 ‘i 


= 9-32 [yi- (a+ be™)?]” (n> 3). 


Ss 


Example 9.7 — Continued 

We now use the oil palm data to estimate the parameters of the Bertalanffy 
function. The results (of the CADEMO package) are shown in Figure 9.12. 

The estimated regression curve is shown in Figure 9.13. 

Schlettwein (1987) did the simulation experiments described in Section 9.4.3 
with normally distributed e; and for several parameter combinations and n- 
values. Some results are shown in Table 9.9. 

These and the other results of Schlettwein allow the conclusion that for nor- 
mally distributed e; in model (9.69), the asymptotic tests and confidence for all 
n= 4 are appropriate. 

We now give in Table 9.10 optimal designs analogously to those in 
Section 9.6.3. 
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CADEMO - Growth [c:\dokume...\oilpalm.dat] 


Decision: 
Estimation of the growth parameters 
with automatic calculation of the initial values 


Data File:  ci\dokume“i\dieter~i\desktopeademow\engsoi lpalm .dat 


The estimates of the parameters for the growth function 
(Bertalanffy(3)] 


F(x) =LAeBee (Cex) 13 
according to the least squeres method, 4s well ss the confidence 


limits of the parameters and the estimation of the residual variance 
s? for a given sample size of n= 12 are: 


Confidence Interval (a = 0.0500 ) 


eee Re [TRE | 
+3452 


ete 0.2181. 


Figure 9.12 LS estimates, asymptotic confidence intervals and the estimated residual 
variance of the Bertalanffy function with data of Example 9.7. 


Figure 9.13 Curve of the estimated Bertalanffy regression function together with the scatter 
plot of the observations of Example 9.7. 
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Table 9.9 Relative frequencies of 10 000 simulated samples for the (incorrect) rejection (left 
hand n)) right hand n, and for the (correct) acception (ny) o¢ Ho for the Bertalanffy function 
with several parameter values and for Q@pom = 0.01 ,0.05 and 0.1 andn=4 


nom = 0.01 Qnom = 0.05 nom = 0.10 
Ho Nu No nm Nu No Nu Nu No Nm 
a=5 0.46 0.48 99.06 2.32 2.40 95.28 4.73 5.07 90.20 
p=-2 0.43 0.44. 99.13 2.48 2.26 95.26 5.10 4.81 90.09 
y = -0.05 0.54. 0.42 99.04. 2.55 2.39 95.06 5.06 4.51 90.43 
a=5 0.51 0.58 98.91 2.53 2.59 94.88 5.08 4.96 89.96 
p=-2 0.61 0.57 98.82 2.65 2.34 95.01 4.98 4.75 90.27 


Ul 
I 
2 
So 
ron 


0.49 0.44. 99.07 2.64 2.26 95.10 5.16 5.78 90.06 
0.53 0.59 98.88 2.61 2.60 94.79 5.02 5.28 89.70 
0.49 0.69 98.82 2.46 2.88 94.66 4.97 5.40 89.63 
0.57 0.66 98.77 2.59 2.64 94.77 5.18 5.17 89.65 
0.44 0.57 98.99 2.33 2.53 95.14 4.62 5.51 89.87 
0.47 0.59 98.94 2.32 2.60 95.08 4.81 5.28 89.91 
0.52 0.58 89.90 2.47 2.40 95.13 5.24 4.72 90.04. 
0.51 0.51 98.98 2.52 2.75 94.73 4.88 5.30 89.82 
0.50 0.53 98.97 2.38 2.75 94.87 4.47 5.34 90.19 
0.49 0.52 98.99 2.65 2.38 94.97 5.16 4.85 89.99 
0.47 0.53 99.00 2.32 2.37 95.31 4.73 4.62 90.65 
0.54. 0.50 98.96 2.54 2.20 95.26 5.04 4.82 90.14. 
0.57 0.57 98.86 2.37 2.33 95.30 4.98 4.78 90.24 


Ul TT UW TT Ul 
Ie cowl loan 
w ow 

=) 

na 


~~ weer ver®ververer 
by city = A ey? =o 
1 ont toad 
NS) oe ° 
° =) 
Q ron 


TT 
I 
2 
So 
ron 


Table 9.10 Optimal experimental designs in the experimental region [1,12] and n = 12. 


(a, B, 7) = (a, B, 7) = (a, B, 7) = 
Criterion (2.44, -1.43, —0.24) (2.35, —1.63, —0.31) (2.54, -1.23, —0.17) 
D 1 5.39 12 1.19 5.05 12 1 5.88 12 
4 4 4 4 


Ca 1 5.55 em 1 5.09 m) 1 5.72 mm 


2 2 


eo 
‘ 5 Fo Ses 
oo ie. 


1 5.54 4 1 5.23 4 1 5.92 » 
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9.6.3 The Logistic (Three-Parametric Hyperbolic Tangent) Function 
The function f;(x, 0) of the model 


a 
=——— + eG =f (Xi 9) + ei, 
Taper * %=Si(%n8) +e (9.71) 


i=1,...,.n, n>3, a40, B>0, 740, 


Ni 


is called logistic function. It has an inflection point at 
1 
9 ine —Inf 
‘4 


with f, (x; 0) = a/2. 
The function in (9.71) can be written as three-parametric hyperbolic tangent 
function with the parameters 


a 1 Y 
a5 .Pr= = Ip and yr = —5 


(see Example 9.4). With 0, = a7, 02 = By and 63 =yr 


y, =a {1+ tanh[y7(x;-fBr)]} + ei 


(9.72) 
i=1,...n, n=3, ar 40, Pp 40, v7 FO, 


is the regression model of a three-parametric hyperbolic tangent function. 

From Section 9.2 follows that as the consequence of a reparametrisation other 
non-linearity properties can be created. It therefore seems reasonable to find a 
reparametrisation with a small curvature measure. We first treat model (9.71) 
and receive 


1 
1+ fe 


afi(%A) | _—ae" _ 


00 (1+ fer)” 
—apxel™ 


(1+ Ber)? 


and the model can be written with 


10 O 
C(@)=| 0 -a 0 
00 -a 


in the form (9.1). We see that # and y are non-linearity parameters. Analogously 
this can also be stated for the function in (9.72). 
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The information matrix for model (9.71) is 
A -aB -afC 
F'F=| -aB @D c?pE 
-aBC BE &° #’G 
with 
Z,=(1+pe™)* 


and 
n n n 
Ae NZ) Booze. Cay Zane”, 
i=1 i=1 i=1 
n n n 
De): Zee, B=) Zaye). G=S. Ze, 
ae i=1 i=1 
Then 


|F"F| =a*f? |ADG + 2BCE-C*D- AE” -B’G] =a'f?A 
follows and the asymptotic covariance matrix is 


vary = (PPE) ae 


1 1 
DG-E? EC-B BE-CD 
G 7 (EC-BG) = (BE-CD) 


o| 1 1 ae. ta 
== | -z(BC-BG) alAG-C’) a, (BC-AE) 


p 


(BE -CD) 


1 1 1 : 
a gplBC-AE) am (AD-B?) 


Initial values for the iterative calculation of the LS estimates are found by 
internal regression (see Section 9.1.2). The differential equation with integral 
Fi, 8) is given by 


(8) = ~ yf, (x,0) (1-5 i(s0)). 


Minimising 


n-1 
Si= a (cy + C097 +9) ,01 #0,c2 £0 with 
i=l 


» _ Vit1iVi;, 
» = ———— (i= 1,...,n-1 
a caer ) 
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by the LS method results in ¢; and ¢2. From these values, initial values a and ¢ for 
the estimator of a and y are given by 


The initial value # for the estimator of f is that value b = b, minimising 


n-1 2 
Sy = > (x +b a =), 


i=1 


The initial values a7, by and cry for the hyperbolic tangent function can be 
gained from those of the logistic function using the parameter transformation 
in front of (9.72). 

The information matrix F’F of the model (9.72) has with the abbreviations 


Ui = eee 


the form 
Ar -ayBr aCr 
F'F= -ayBr @y’Dr -c’yEr 
aCr -@yEr @Gr 
with 


|FTF| =a*y’ |ArDrGr + 2BrCrEr -CpDr -AzEr -B7Gr| =a4y’ Ar. 


The asymptotic covariance matrix of the estimator 0% of 02 = (ar,B 7,77) is 
given by 


1 1 
DrGy= EP = oy Etc -BrGr) —(BrEr-DrCr) 


1 1 1 
vara (Or) = Bie BrGr) aap AiG - Cr) Bap aeeT) 


1 1 1 ; 
~ (BrEr~CrDr) Dy (BrCr -ArEr) —(ArDr-B?) 
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Table 9.11 Optimal experimental designs in the experimental region [1,14] and n = 14. 


Criterion (a, B, 7) = (126, 20, -0.46) (a, B, y) = (123, 16,-0.5) (a, B, 7) = (130, 23, -0.42) 


D 3.93 8.29 14 3.30 7.39 14 4.42 9.00 14 
5 5 4 5 5 4 5 5 4 
Ca 1.15 8,07 14 1.73 7.34 14 2.81 9.20 14 
2 3.9 2 2 10 3 3. 8 
Cp 2.70 8.95 14 2.16 2.37 14 3.11 3.46 9.75 
ll 2 61 11 1 2 11 1 2 
C, 2.38 8.31 14 1.81 7.37 14 2.80 9.12 14 
8 4 2 8 4 2 8 4 2 


For 1 >3 test statistics and confidence intervals can be written down corre- 
spondingly to the sections above. 

Further 

1 n 
s2. = —~\ “(y;-arp -arz tanh{cr(x;-br)])” 
is the residual variance. In Example 9.3 the curve fitting of a logistic function was 
demonstrated by SPSS. 

In simulation experiments as described in Section 9.4.3 for 15 (a, B,y)- 
combinations (with inflection points at 10, 30 and 50 respectively), x;-values 
in [0,65], normal-distributed e; and Q,, = 0.05 and 0.1 have been performed. 
For all parameter combinations, the result was that tests and confidence estima- 
tions based on the asymptotic covariance matrix can be recommended not only 
for n > 3 as well as for normally distributed e; but also for e; following some 
Fleishman distributions. 

All concrete optimal designs have been three-point designs. In Table 9.11 we 
give optimal designs for the estimates of the parameters and for the confidence 
bounds in Figure 9.5 (hemp growth of Example 9.3). 


9.6.4 The Gompertz Function 
The regression function fg(x,0) of the model 


y, = ae?" +e, =fe(x,0) +e (i=1,...n,n>3,a40,y40,8<0) (9.73) 


is called Gompertz function. In Gompertz (1825) it was used to describe the 
population growth. The function has an inflection point at 


_ In(-A) 
Y 


xX] = 


a 
with fg(x;) = eo 
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The vector 
1 
— f(x) 
afalna) | a ore 
00 my ta(x)e* , , (a,B,7) 
G(x) Boxe” 
can be written with 
10 0 
C (0) =!0 l/a 0 
00 = I/a 


in the form (9.1), where # and y are non-linearity parameters. We again use 
abbreviations 


n n n 
A= seer B= emer C= Seer 
i=1 i=1 i=1 
n n n 
D= ye eve" Ex Sx ee" Ga ye ene 
j=1 i=1 i=1 


t= 


so that 
A aB- apC 
F'F=|aB @D of fE 
aBC BE o&° f’G 
and 
|F7F| =a*p’A =a*f? |ADG + 2BCE-C*D- AE” -B’G] £0. 


The asymptotic covariance matrix is therefore 


1 1 
DG-E? —(EC-BG —(BE-CD 
G g(EC-BG) | ,(BE-CD) 


react. a | 4 1 am 
var4(0) =0° (F'F) = “(EC BG) a (AG-C’) ms giBC CD) 
it 1 1 g 
= g BE CD) a BBC AE) OP (AD-B’) 


Initial values for the iterative calculation of parameter estimates can be found 
by reducing the problems to that of the exponential regression using z; = Iny; 


(yi; > 0). 


Because 


Infg(x,0) = Ina + Be™ =ag+ fe” with az = Ina 
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Table 9.12 Relative frequencies of 10 000 simulated samples for the (incorrect) rejection (left 
hand n, right hand n,) and for the (correct) acception (ny) of Ho for the Gompertz function 
with several parameter values and for Q@pgm = 0.01 ,0.05 and 0.1 and n = 4. 


nom = 0.01 nom = 0.05 nom = 0.10 

Ho ny Nu nm ny Nu Qu ny Nu nu 
a = 33.33 056 048 98.96 2.58 2.003 95.39 5.27 417 90.56 
B = -6.05 058 046 98.96 333 244 9423 611 475 89.14 
y = -0.06 045 #052 99.03 2.26 2.73 95.01 4.76 5.65 89.59 
a = 33.33 0.47 = 0.41 99.12 2.61 2.07 95.32 5.01 422 90.77 
P= -11.023 046 053 99.01 246 2.16 95.38 484 454 90.62 
y = -0.08 050 045 99.05 217 2.36 95.47 447 4.77 90.76 
a = 33.33 049 0.50 99.01 2.65 2.53 9482 5.07 5.11 89.82 
B = -20.09 046 045 99.09 2.41 2.80 94.79 4.77 5.28 89.95 
B = -0.10 044 043 99.13 264 246 94.90 5.24 4.85 89.91 
a= 100 0.61 0.51 98.88 2.77 249 94.74 5.35 453 90.12 
fp = -36.6 049 047 99.04 2.68 2.37 94.95 5.21 5.01 89.78 

= -0.06 0.50 056 98.94 248 2.55 9497 477 5.20 90.03 

= 25 0.52 0.39 99.09 2.67 1.96 95.37 5.89 3.75 90.36 

= -6.05 0.65 043 9892 293 243 9464 613 473 89.14 


= -0.06 044 060 98.96 246 2.69 94.85 4.70 5.29 90.01 
0.59 0.50 98.91 2.59 2.12 95.29 5.26 4.39 90.35 
=-11.023 047 0.50 99.03 2.68 2.19 95.13 5.20 4.61 90.19 
= -0.08 0.51 0.47 99.02 2.19 2.43 95.38 4.35 4.86 90.79 
= 25 0.55 044 99.01 2.69 1.98 95.33 5.28 424 90.48 
= -20.09 0.43 0.50 99.07 2.16 2.24 9560 4.61 4.39 91.00 
= -0.10 0.58 047 98.95 2.27 2.13 95.60 448 452 91.00 


x~wewervwnwevv exe 
I 
No 
Nn 


from the initial values (or LS estimates), az, bz, cr of the exponential regression 
for the (z;, x;) initial values a’ =e”, b' = be and c’ = cg for the Gompertz function 
can be obtained. 

Tests and confidence regions can analogously be constructed as in the sec- 
tions above. Numerous simulation experiments have been performed to show 
how good those tests are even for small sample sizes. In Table 9.12 we give the 
results for n = 4. In general it can be said that tests and confidence regions 
approximately hold the nominal risks and can be always recommended. 
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Decision: 
Estimation of the growth peraneters 
with automatic calculation of the initial values 


Data File:  ci\dokume“i\dieter~i\desktopeademow\engsoi lpalm .dat 


The estimates of the parameters for the growth function 
(Gompertz (3) 1 


(x) =Aee (Bee Cox) } 
according to the least squeres method, 4s well ss the confidence 


limits of the parameters and the estimation of the residual variance 
s? for a given sample size of n= 12 are: 


Confidence Interval (a = 0.0500 ) 


a ae [eI | 


14.1430 +6069 15.6791 
2.3470 +8303 “1.8636 


0.2850 * 0.2035 


Figure 9.14 LS estimates, asymptotic confidence intervals and the estimated residual 
variance of the Gompertz function with data of Example 9.7. 


Figure 9.15 Curve of the estimated Gompertz regression function together with the scatter 
plot of the observations of Example 9.7. 
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Example 9.7 — Continued 

We can now use the oil palm data of Table 9.3 to estimate the parameters of 
the Gompertz function by the program Growth (see Section 9.6.1.2). The results 
are shown in Figure 9.14. The estimated regression curve is shown in 
Figure 9.15. 

All concrete optimal designs have been three-point designs. In Table 9.13 we 
give optimal designs for the estimates of the parameters and for the confidence 
bounds in Figure 9.14 (oil palm growth). 


9.6.5 The Hyperbolic Tangent Function with Four Parameters 
We consider the regression model 
y, =a+fPtanh(y + 6x;) +e; =fr(%,0) +e; (i=1,....n, n>4, B>O, 6>0). 
(9.74) 


Ffr(x, 9) has an inflection point at x; = - ; with f(x; 9) = a and two asymptotes at 


y=atfand y=a- f, respectively. 


Because 
1 
fr (x0) _ tanh(y + dx) 
00 «|: B[1-tanh?(y + dx)] 


x[1-tanh?(y + dx)| 


Table 9.13 Optimal experimental designs in the experimental region [1,12] and n = 12. 


(a, B, y) = (a, B y) = (a, B y) = 
Criterion (14.14, —2.35, -0.285) (12.61, —2.83, —0.37) (15.68, —1.86, —0.2) 
D 1 5.56 12 1.29 5.14 12 1 6.09 12 
4 4 4 4 4 4 


Ca 1 5.65 5 


1 4.90 12 1 6.22 12 
ne ee ‘) is :) 
Cp 1 7.94 9.09 1 5.47 5.95 1.51 11.21 12 
(eT) Ge) GP) 
( ce 5.39 2) i 6.26 ") 
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C(Q) in (9.1) can be chosen so that y and 6 are non-linearity parameters. 
With v; = tan h(y + 6x;) and 


n n n n n 
Ay: vy B= Nv C2) ay Da): ay ME =D vy, 
i=l i=1 i=1 i=1 i=1 
n n n n 
G=5 Uae H=)S xvi, I=5 XiVis K=)5 OP 
i=l i=l i=1 i=1 


L=S oat, M=S oa}, N=S oat, 
i=l i ra 
F'F becomes 

n A B(n-B) B(c-D) 
A B B(A-E) BU-M) 

B(n-B) B(A-E) #?(n-2B+G) #?(C-B-D+H) 

B(C-D) BU-M) f?(C-B-D+H) #?(K-2L+N) 

n A pP BQ 

A B pR_ ps 

BP BR PT PU 

bQ pS PU PW 


and the asymptotic covariance matrix is 


var4(0) =o? (FTF) =(o%), €,n=4,b,c,d. 


To reach numerical stability in simulation experiments, it is favourable to 
invert F’F analytically and then to input the x;-values. The formulae for the ele- 
ments of var,(0) are given in Gretzebach (1986). We give below the main diag- 
onal elements o;: = o-(€ = 4, b, c, d), that is, the asymptotic variances of the LS 
estimators a, b, c and d, writing 
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a= (FTF: 
2 

o,= [BTW +2RSU-R’W-S°T- UPB], 
2, 

o= = [nTW + 2QPU-RQ-P?W -nU’], 


2 
o2 =" [nBW +2AQS-BQ?-A?W -nS"], 


Cc Ap’ 
2, 

02 = “_[nBT + 2ARP- BP? - AT -n8"]. 
Ap 


The initial values should be found by internal regression, leading to the LS esti- 
mates a, b,c and d for a, f, y and 6. Test statistics and confidence estimations are 
obtained analogously to the sections above. 


Example 9.7 — Continued 

Now we can use the oil palm data to estimate the parameters of the hyperbolic 
tangent function with four parameters. The results are shown in Figure 9.16, 
and the estimated regression curve is shown in Figure 9.17. 

The simulation experiments described in Section 9.4.3 to check the tests and 
confidence estimations based on the asymptotic covariance matrix for small 
are performed for a = 6 = 50 and 


e 6 = 0.15 with y = -2.25, -4.5 and -6.75 
e 6=0.1 with y = -1.5, -3 and -4.5 
e 6 = 0.05 with y = -0.75, -1.5 and -2.25 


normally distributed e; and n equidistant x;-values in the interval [0,65]; =5 
(1)15. 

It was found that the actual risk a, of tests and confidence estimations 
differed by maximal 20% from the nominal risk Qj), = 0.05 if at least n = 10 
measurements were present. For @,-;= 0.1 this was already the case from n = 
9, but for Ajo = 0.01 at least 25 measurements have been needed. 

For these results let us conjecture that tests and confidence estimations 
based on the asymptotic covariance matrix for four-parametric functions 
(not as for three-parametric functions) can already be recommended 
from n=p+1 on. This conjecture is supported in the two following 
sections. 
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Decision: 
Estimation of the growth parameters 
with automatic calculation of the initial values 


Data File:  ci\dokume“i\dieter~i\desktopeademow\engsoi lpalm .dat 


The estimates of the parameters for the growth function 
CTanh (497 


Oe) =AeBetanh(De (Cx) 1) 
according to the least squeres method, 4s well as the confidence 


limits of the parameters and the estimation of the residual variance 
s? for a given sample size of n= 12 are: 


Confidence Interval (a = 0.0500 ) 


sts 0.2238. 


Figure 9.16 LS estimates, asymptotic confidence intervals and the estimated residual 
variance of the hyperbolic tangent function with data of Example 9.7. 


Figure 9.17 Curve of the estimated hyperbolic tangent regression function together with 
the scatter plot of the observations of Example 9.7. 
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We finally calculate the locally D-optimal designs for the LS estimates in 
Figure 9.16 and the experimental region and sample size of Example 9.7 and 
receive 


1 3.39 8.36 14 
44 3 3) 
9.6.6 The Arc Tangent Function with Four Parameters 


We consider the regression model 


y, =a+ Barctan[y(x;—6)] + e; =f4(%;,0) + e; 
(i=1,....n, n>4, B40,y>0, 640). 


The function f,(x, 0) has an inflection point at x; = 6 where f4(x;, 0) = a. f(x, @) 
has two asymptotes at a + f2/2 and a — fa/2. We receive 


(9.75) 


1 
arctan|y(x-6)] 
ofa(*,O) _ B(x-6) 
00 1+y2(«-6) 
=BY 
1+y2(«-6)* 
Writing this in the form (9.1) shows that y and 6 are non-linearity 
parameters. 
We put 
uj=xj-5, Vj=1+y7(x;-5), wi = arctan[y(x;—6)] 
and get 


n A. BC. -fyD 
A B fE -fyG 
pC PE PH -fyJ 
—pyD -fyG -PyJ PPK 


F'TF= 


with 


n 


“1 
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The asymptotic covariance matrix 
var, (0) =o (FF) . 

is estimated by 
Vas" (E’F) age (ki) - 


Further 
2 = ES {y;-a-b arctan [e(«;-d}’. 
a ele 


Initial values should be found by internal regression, leading to the LS estimates 
a,b,c and d for a, f,y and 6., Test statistics and confidence estimations are 
obtained analogous to the sections above. 


Example 9.7 — Continued 

Now we use the oil palm data to estimate the parameters of the arcustangens 
function with four parameters. The results are shown in Figure 9.18, and the 
estimated regression curve is shown in Figure 9.19. 


[EECADEMO - Growth [c:\dokume...\oilpalm.dat] 


Ele Edt Options Dictionary Window Help 
Grovith Functions Are-tar4) 


te] 
E : 
Decision: 


Estimation of the growth parameters 
with automatic calculation of the initial values 


Data File:  ¢:\dokume“i\dieter™1\desktop\cademow\engsoilpalm .dat 


The estimates of the parameters for the growth function 
CArc-tan(4)1 


f= # - * arctan(De(C-x)] 
n 


according to the least squeres method, as well as the confidence 
limits of the parameters end the estimation of the residuel veriance 
s? for a given sample size of m= 12 are: 


le eed it le 


Confidence Interval ( a = 0.0500 ) 


Parameter Estimate Lower Limit Upper Limit 


“1.7999 “47.8524 44.2526 
21.9330 ~37 4669 81.3329 
-0.9299 -20.7353 18.8755 
0.1453 0.0413 0.2492 


Figure 9.18 LS estimates, asymptotic confidence intervals and the estimated residual 
variance of the arcustangens function with data of Example 9.7. 
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Figure 9.19 17 Curve of the estimated Arcustangens regression function together with the 
scatter plot of the observations of Example 9.7. 


The simulation experiments described in Section 9.4.3 to check the tests and 
confidence estimations based on the asymptotic covariance matrix for small n 
are performed for a = = 50 and a = 40, f = 20 with 


y = 0.05, 0.1, 0.2 and 6= -50, -30, -10, 


normally distributed e; and 1 equidistant x;-values in the interval [0, 65] using 
n = 4(1)20. 

It was found that the actual risk a,,; of tests and confidence estimations 
differed by maximal 20% from the nominal risk a,,,,, = 0.05 if at least n = 
11 measurements were present. For a@jc¢=0.1 this was already from n 
10 the case. 

We finally calculate the locally D-optimal designs for the LS estimates in 
Figure 9.18 and the experimental region and sample size of Example 9.7 and 
receive 


1 2.97 7.68 14 
44 3 3] 
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9.6.7 The Richards Function 
The function fe(x, ) in the regression model 


Y 141/51 ~1/6 
ya] 1+ ei 00) " + €j = fa (xi) + ej 


(i=1,..,n,n>4,a40, 740, 6<0) 


(9.76) 


in Richards (1959) was used to model the growth of plants; the parametrisation 
in (9.76) stems from Schénfelder (1987) and was introduced because the iter- 
ative calculation of the LS estimates have been relatively easy and the suitability 
of the asymptotic covariance matrix for tests and confidence estimations was 
given. Further some parameters can be interpreted: a is the value of the asymp- 
tote and f the x-coordinate of the inflection point. 

The parameters /, y, 6 are non-linearity parameters. Writing fe(x, 0) in its 
original form fe(x,0) = (a* + f*e”**)®, in Richards (1959), then all parameters 
are non-linearity parameters. 

There are enormous numerical problems with this function, especially for 
gaining initial values and for the iterative calculations of the LS estimates. 
We recommend the interested reader to read the PhD thesis in Schonfelder 
(1987) where FORTRAN programs are given. 

Tests and confidence estimations have been checked by Schénfelder as 
described in Section 9.4.3 for equidistant x; and for the x; of a locally D-optimal 
designs in [1; 65] and the parameter combinations (a, f, y, 6): 


(35; 27; 1; 0.7), (20; 27; 1; 0.7), (35; 15; 1; 0.7), (35; 27; 5; 0.7), (35; 27; 1-0.5), 
(50; 27; 1; 0.7),(35; 45; 1; 0.7), (35; 27; 3; 0.7), (35; 27; 1; 10) 


for normally distributed e;. 

The tests and confidence estimations based on the asymptotic covariance 
can be used if for Go = 0.05 there is 1 > 14. For the locally D-optimal design 
already with n > 8 satisfying results have been found. 


9.6.8 Summarising the Results of Sections 9.6.1-9.6.7 


First we summarise the results of the simulation experiments. 

The simulation experiments described in Section 9.4.3 to check the tests and 
confidence estimations based on the asymptotic covariance matrix give the 
inducement to conjecture that for three-parametric regression functions, 
n = 4 observations are sufficient for the approximation with the asymptotic 
covariance matrix. 

For four-parametric regression functions, the minimal number sufficient for 
the approximation with the asymptotic covariance matrix depends strongly on 
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the function and on the allocation of the support points. Seldom n < 10 is 
sufficient. 

Next we summarise results concerning locally optimal designs. 

For three-parametric regression functions, we conjecture that locally optimal 
designs are always three-point designs. For four-parametric regression func- 
tions, we conjecture that the D-optimal designs are three-point designs. 

Paulo and Rasch (2002) investigated the sensitivity of D-optimal designs if 
parameters differ from values used in the locally optimal designs and found that 
the support is relatively non-sensitive. 


9.6.9 Problems of Model Choice 


As we can see from Example 9.7, it is not easy to select a proper regression func- 
tion for given (x; y;) values. Numerical criteria have been proposed by several 
authors. Given a class F= {fi(x, 9), ...,f-(«, @)} of functions from which one 
has to be chosen as the ‘best’, which of the criteria should be used? Rasch 
and van Wijk (1994) considered in a simulation experiment the r = 8 in Sections 
9,6.1-9.6.7 handled functions and five criteria. In the following f j(*ir9) are the 
values of the LS method to get a (x;, ;) fitted function with the estimated para- 
meters. The regression function is f(x, 0) € F and p; is the number of the esti- 
mated parameters. 
The criteria are (u > p, j = 1,...,r): 


K1: s7 (residual variance by fitting f(x, 0) € F), with 


s; : 3 be File0)] 


n~ Pit 


(n-pj) $7 
n oF 


K2: Cy = +2pj-n (C, criterion in Mallows (1973)) 


oF is an estimate of o* different from si. This could be the MS,,, in a simple 
analysis of variance, if several measurements at support points are available. 


K3: Jackknife criterion 

Drop in (x; y; (i= 1, ..., 2) the /th pair (= 1, ..., 2). With the  — 1 remain- 
ing data pairs, the functions f(x, @) € F are fitted. Let 9,(j) = ao u (x,,0) be the 
value of the fitted function at x;.Then 


Ky = p1-P 


is the value of the Jackknife criterion. The name is chosen analogously to the 
Jackknife estimate (Chapter 4). 
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K4: Modified Akaike criterion (Akaike, 1974) 


With S} = “2 5? is 


ij 
n(n+pj) 


AIC; =nIn(S?) + fob 
—pj- 


K5: Schwarz criterion (Schwarz, 1978) 


Pj 2 


n- 
With T? = —*s? is 
j a 


SC; =nin(T; ?) + p)In(n ye 


Rasch and van Wijk (1994) found the Jackknife version of the modified Akaike 
criterion to be the best one. In the simulation experiment, values have been gen- 
erated with each function in F. To the values of the function error terms have 
been added. By the LS method each function was fitted to each generated data 
set. This was repeated 5000 times. In an 8x8 matrix it was shown how often data 
of a generating function (in the row of the matrix) was selected by one of the 
criteria (in the column) as best fitted. Of course a heavy main diagonal of the 
matrix is ideal for a criterion. 


9.7. Exercises 


9.1 


9.2 


Which of the regression functions below are linear, quasilinear or intrin- 
sically non-linear? 
a) flx, 0) = 0, + Oox + 03x" 


b) f(x, 0) = 01 +023 


c) Six, 9) = a2 04x41 + 05x 

d) f(x, he Ox 

€) fx, A) = 01%) + OaX2 + 3x %2 
S (%,0) =O, +e 


f) 
02x 
0)=0 
8) £(%8) ae ore 


Determine the non-linearity parameter(s) of the following regression 


functions: 
a) flx, 0) = 0, + sin(O2x + 03) 


Ox 
59) = 
c) f («,0) = 0,e%** % 
d) f (x,0) = 0, + 022” 
1 
e) Ff (#0) =O, +O2— + 03(1+e™%)” 
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9.3 Fit the exponential regression function 


y=f(x,0) =at pe +e, O= (a,B,y)'; ¥<0 
to the data below! 


Time 0 1 2 3 4 5 6 i 8 9 10 
Value 77.2 94.5 107.2 116.0 1224 126.7 129.2 129.9 1304 130.8 131.2 


Calculate the estimates a, b, c for the parameters by the LS method. Give 
further an estimate for the variance o”. 


9.4 For the no-load loss of a generator in dependency of the voltage measure- 
ments the data are: 


X voltage 230 295 360 425 490 555 620 
Loss L (kW) 64.0 66.0 69.5 74.0 80.8 91.0 103.5 


Which of the regression models in Section 9.6 fits best? 
Find estimates for the parameter of the best fitting function by the LS 
method and give an estimate for the error variance. 
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Analysis of Covariance (ANCOVA) 


10.1 Introduction 


Analysis of covariance (ANCOVA) as a branch of applied statistics covers sev- 
eral objectives. In any case, the observations are influenced by at least two fac- 
tors. At least one of these factors has several levels, by which the material is 
classified into classes. At least one further factor is a regressor in a regression 
model between different variables in the model and called a covariable or a cov- 
ariate. One branch of the ANCOVA is to test whether the influence of the cov- 
ariable is significant and as the case may be to eliminate it. 

If the factor is qualitative (not numeric), this target can be achieved simply by 
blocking and using analysis of variance (ANOVA). Another branch of the 
ANCOVA is to estimate the parameters of the regression model within the 
classes of the classification factor. 

If we have just one classification factor and one covariable, then we have four 
models of the ANCOVA: 


Model I-I: Levels of the classification factor fixed and model I of regression 

Model I-II: Levels of the classification factor fixed and model II of regression 

Model II-I: Levels of the classification factor random and model I of regression 

Model II-II: Levels of the classification factor random and model II of 
regression 


In statistical (theoretical) textbooks, mainly model I-I was presented. How- 
ever, in applications and in many examples, exclusively cases are found for 
which model I-II must be used. The results found for model I-I are used for 
model I-II. Real practical examples for model I—I can hardly be found. Graybill 
(1961) bypasses this difficulty by using a fictive numerical example. 

Searle (1971, 2012) uses as levels of the classification factor three kinds of 
school education of fathers of a family and the number of children in the family 
as a covariable. The observed trait is the amount of expenditures. Here certainly 
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a model of an incomplete two-way cross-classification could be used, and the 
question whether the covariable ‘number of children’ leads to model I or II 
of the regression analysis depends on data collection. Scheffé (1953) considered 
an introductory example with the classification factor ‘kind of starch’ and the 
covariable the thickness of the starch strata. This is an example where model 
I-II can be used although afterwards he discusses model I-I. However, Scheffé 
is one of the few recognising and discussing the two models. He gave a heuristic 
rationale of the application of the results derived for model I-I but applied for 
model I-II. The background for this is the applicability of methods of estimating 
and testing of model I of regression to model II of regression as described in 
Chapter 8. 

In the text below exclusively model I-I as special case of model equation (4.1) 
is considered. Model II-I] corresponds with problems treated in Chapter 6 
(estimation of variance and covariance components). 


10.2 General Model I-I of the Analysis of Covariance 
We consider the following special case. 


Definition 10.1 If Xf in (5.1) with the assumptions of Definition 5.1 can be 
written as 


XPp=Wa+Zy 
with 

X=(W,Z), po =(a4y*), 

X :[X x (a+ 1)| matrix of rank p<a+1, 

W :(N x (t¢+1)] matrix of rank r,0<r<t+1l<atl, 

W =(1y, W*), 

Z:(N xs] matrix of rankO<s<p<a+l, 

A= (A150), 1=(Ypnnts) (t+8=a,r+5=p), 
then the model equation 

Y=Wa+Zy+e, Q=RiWIORIZ] (10.1) 
under the distributional assumption 

Y: N(Wa+Zy, ol), e:N(On, oI) (10.2) 


is called a model I-I of the ANCOVA. Normality is only needed for testing and 
confidence estimation. 
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The columns of Z define the covariable. 
First we will give an example. 


Example 10.1 In a populations G),...,G, independent random vectors 
Y,,...,Y, of size 1,...,N, are available. Let Yi = (Vier dVing) In G; the Y; 
are N({y;}), o7J,,)-distributed with {u;} = (diassbig,) 


Case (a). 4; can be written as 
My=ht+at+yzj (i=1,...4; j=1,...ni). 


z are given values of a real (influence) variable Z, the covariable of the model. 
Case (b). w; can be written as 


Mg=M+ar+yzy (i=L..a; j=L..Mi). 


Then (10.1) has the special form: 
Case (a): 


yp =Hraryzj+e; (G=1,..44 f=l,..5ni) (10.3) 


Case (b): 


Vig =H AGF Vii + Ci (i=1,...,4; j=1,...,M)). (10.4) 


In (10.3) and (10.4) (as special case of (10.1)), 


wT = 


00..000.. 0 + 11..1 
-_eOCOCO eo” -_v_ 


ny ny Ng 


A= (My Q1y-0q)'s Yo =s(Vi 7) rand @= (Ct y-s€an,) 
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In (10.3) Z7 = (z11,...)Zan,) and y is a scalar. In (10.4) v7 = (71,72)--5Y%q) and 


Z11 0 .. O 

Zin, 0 0 

0 Z21 0 
Z — 

0 Z21 0 

0 0 Zal 

0 OF ba Zane 


Example 10.2 We consider the situation of Example 10.1 with a populations, 
but for yj we use the model equation 


My = H+ i +112; poet 1sZij (Gi=1,....4; j=1,...,m3), 
so that (10.1) has the special form 
Vj HAUGAN Sy tN1Zy tT Vy ty (10.5) 


Y, a and e are given in Example 10.1 with y7 = (y,,79,...7,) and 


2 S 
Zl yy eee ZY 
2 S 
Z120 2g owe ZO 
L= 
2 
Zang Zan, °** Zang 


With Examples 10.1 and 10.2, all typical problems of the ANCOVA model I-I 
can be illustrated: 


e Testing the hypothesis Ho: aj =--- = 

e Testing the hypothesis Hp : y= 0 (Example 10.1. Case (a)) 

e Testing the hypothesis Ho: y, =---=y, (Example 10.1. Case (b)) 

e Testing the hypothesis Ho: 7, =7,41 =-::=Y,=0 for 2 < r < s—1 (Example 10.2) 
e Estimation of ay or 71,...7s 


Going back to the general case, we note that with W also X does not have full 
rank. Therefore X'X is singular, and the normal equations in Section 4.1 
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X Xp =XTY 


have no unique solution. 
For model I-I of the ANCOVA due to 


X=(W,Z), BT =(a7,77), 


the normal equations have the form 


Ww W'Z\ (a wly (10.6) 
zw ztz)\y) \zty} 


Let Gy be a generalised inverse of WW, and then from (10.6) we obtain 
a’ =Gy(WTY-W"Zy’). (10.7) 
If a*™ is the solution of the normal equations of model equation (10.1) for y = 0, 
(without covariable), then 
a* =a" -GywW'Zy* (10.8) 


where a** = Gy WY is the solution of the normal equations for an ANOVA 
(model I) with model equation (5.1). The formula for a** is up to the notation 
identical with (5.3). If we apply a* in (10.6), we receive for y* formula (10.10). In 
the following theorem we show that y* is uniquely determined and BLUE of the 
estimable function y. 


Theorem 10.1 Let model equation (10.1) and the distributional assumption 
(10.2) be valid for Y. Then y is estimable and the solution y* of the normal equa- 
tions (10.6) are unique (i.e. independent of the special choice of Gy) and BLUE 
of y. We therefore write y* = 7. 


Proof: At first we show that y* is unique and write (10.6) detailed as 
W War+W'Zy* =Wwly 
Z'Wa*+Z*Zy* =Z'Y 

or using (10.8) and W'WGyW' = W7 as 


(10.9) 


W Wa =Wly 
Z* Wat -Z'W GyW'Zy + Z'Zy* =Z1Y | 


Due to Theorem 4.13 the relation a** = Gy WY follows from the first equa- 
tion of (10.9), and appointing this to the second equation of (10.9) leads to 


ZW GwW'Y-Z'W GwW'Zy* +Z'Zy* =Z'Y. 
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With the idempotent matrix A =Iy —- WGy W’, this gives the solution 
y =(Z'AZ) ZTAY, 
where (Z'AZ) is a generalised inverse of Z'AZ. Because A is idempotent, we 


have rk(AZ) =rk(Z™AZ). Further AZ has a full column rank, so that Z"AZ is 
non-singular and (Z'AZ) = (Z™AZ)". Therefore 


vy =7=(Z™AZ) 'ZTAY (10.10) 


is the unique solution component of (10.6). 
From Lemma 4.1 we know that 7 is estimable if it is a linear function of E(/). 
From (10.10) we get 


E(9) = (Z™AZ)'Z™AE(Y) = (ZAZ) 'Z™A(Wa+Zy). 


Because AW = (In-WGyW7)W =W-W =O we obtain E(7) =. Therefore 
y is BLUE of y, and this completes the proof. 


Corollary 10.1 The estimator 7 of y in model equation (10.1) is a BLUE con- 
cerning y in the model equation 


Y=AZy+e (10.11) 


with A=Iy-WGwW! where Y,W,Z and e fulfil the conditions of 
Definition 10.1. 

This follows from the idempotence of A and the results of Chapter 8. 

We now test the hypotheses 


Hy: M'y=c, rk(M*) =V<s 
and 
Ho:L'a=R, rk(L") =uc<r. 
Because y is estimable, 
Hy: M'y=c 


is testable. For the second hypothesis only such matrices are admitted, for which the 
rows of La are estimable. From (10.8) it follows that L7a with a in (10.1) is esti- 
mable, if L7a is in the corresponding model of the ANOVA (with y = 0) estimable. 

In Chapter 4 the F-test statistic (4.37) was used as a test statistic of the hypoth- 
esis Hy :y = 0. From the results of Chapter 4, we can now derive a test statistic 
for the general (nonlinear) hypothesis Ho : M'y = c, with a (v x s) matrix M with 
v<s and of rank v. The hypothesis Ho : My =c defines a subspace w, C Q of 
dimension s - v. 
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We write M'y =c asa special case of the general hypothesis K7# = a in model 
(4.29). To test this hypothesis we obtain with N-p and q=rk(K) and if 
Hy: K'f =a is true the F-distributed test statistic 


(KTB* —a)"[K™(X™X) K]"(KTB* -a) N-p 


iz 
F= 10.12 
Y'(Iy—X(X™X) XT)¥ q ( ) 
We now use 
T T a Or41 
K’= (Oy,¢4+1.M ), B= , X=(W,Z), a= 
7 c 
with a (v x s) matrix M’ of rank v<s. For (X7X) we write 
_ (Ww wiz\- 
LX = , 10.13 
( ) oe ea ( ) 


and this becomes with Gy =(W7W)~ and with D=Z" (In-WGyW*)Z= 
Z'AZ 


Tzp-1yT T T7p-l 

(XTX) = je : nae ze si ae ) (10.14) 
Therefore, 

K1(X™X) K=M"(Z™AZ) 'M 

is non-singular (M has full row rank) and 

X (XTX) XT =W GwWT+AZ(Z™AZ) “ZTA. (10.15) 
That means that (10.12) for our special case becomes 

7” (M™7-c)" [m(ztaz)'M] (MTP-0) Novas ae 


Y'(Iy-X(XTX) XT)Y v 
F is F(N-r-s,v,A)-distributed with the non-centrality parameter 
4-4 (MTy-0)" [M™(z7Az)"'M] (MT y—c). 
o 
Analogously we obtain the test statistic for the special hypothesis Hy : L7a = Ras 


24 
(Lta*—R)" {17 [ew +GyWTZ(ZTAZ) ZTW Gy |z} PR gx 
Yaka SY ge 2 
(10.17) 


F= 
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if we put K7 = (L7,O,,;), where rk(L7) = u<r. The hypothesis Ho : L’ a = R cor- 
responds with a subspace w, CQ of dimension r-u and F in (10.17) is 
F(N-r-—s,u,)-distributed with the non-centrality parameter 


lL et T/7T T T “157 va (rasa ie Pe 
A= (LTa-R)"{1"|Gw + GwW"Z(ZAZ) 'Z"wGy,|L™) (LTa-R). 
For the special case R=0,, we write (10.12) as done in this section for model 
equation (10.1) where rk(Z7) =r—1 and y =0,. In (10.12) we replace X by W, 
K by L, B by a™, a by 0,, g by r— 1 and p by r and receive 


«eT 7 T LT L “1pT eT 
F=" se Ne (10.12a) 
Y"(Iv-W GwWT)Y rl 


as test statistic for the test of the null hypothesis Hy : L?a = 0 in model (10.1), 
if y=0. 
If we now consider the hypothesis 


Ho: L"(a+GwW'"Zy) =0 
for the general model (10.1) and observe 


a 


L'(a+GwW"Zy) =K" ( =K™B=0 


i 


with KT =L7 (I,.1,GwW7Z) in (10.12) for @=0,41, we obtain a test statistic 
with numerator SS 


BIK KT (os 'K| "KT 


and we show that this becomes a*TL(LTGyL) ‘LT a. This is true because 
K™(X?X) =L'GwyL after decomposing (X7X) as in (10.13) and because 


Tea 


(ohare) (ch, \b= ea) Tweet 

By this we get a F-statistic for testing the hypothesis Hy : L’a = 0 in (10.1) and 
we have y = 0;. 

In (10.12a) we get the same numerator SS to test the hypothesis 
Hy :L" (a+ GwW*Z) =0 for model (10.1). 

We use this in Section 10.3. 
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10.3 Special Models of the Analysis of Covariance 
for the Simple Classification 


The general formulae derived in Section 10.2 are now for special cases explic- 
itly given. 


Definition 10.2 Model equation (10.1) with the side conditions (10.2) is the 


model of the simple (one-way) classifications of the analysis of covariance if in (10.1) 


De cgehs DRE age Dect, AE Doses 
11..100..0..00... 0 
0o0..011..1..00.. 0 


wre 


00..000..0.. 11... 0 


and a? = (,@},...,@) is chosen. 
From Theorem 5.3 we know that the denominator of (10.12) is 


YAY =Y¥"(In-WGwW')Y= Syn et =SSresy. (10.18) 


i=1 j=l i=l 


Analogously we write with Z7 = (z11,...,Zan,) 


a Nn; a 1 
Taz 2 oe 
ZIAZ = De ag = SSresz (10.19) 
i=1lj= i= 
and 
a ni a 1 
T 
ZAY= 2 S yyeH = mite = SPres. (10.20) 
i=1j=1 joi 
Further is 
4.1 1 
SStotal = SSt =Z*Z=Z"AZ + Bae = NZ = SSz + SSresz (10.21) 
and 


ane 1 
SProint =ZiY =Z"AY + S°=Z:. Y,.-—Z.Y.. = SPetween + SPres- 
fey N 


(10.22) 


503 


504 


Mathematical Statistics 


Here Y’ =(Y{,...,¥7) is a vector with a independent random samples Y/ 
and the elements of the ith sample are N(y + aj,07)-distributed. 


Definition 10.2 is still rather general. Below we discuss special cases of 
Example 10.1. Let y be a scalar and Z? =z? = (211,004, Zinys-oZalo+«-Zan,) @ 
row vector, such that (10.1) becomes (10.3). If y= (Yys9%a) and 


Ze eZ, with Z;=(Ziy.Zin)) (i= 1,04), 
then (10.1) becomes (10.4). If y = @icarieeraneia.)- and 
Z= ® 6, zi 
then (10.1) becomes 
Vij = HAAG Y Zi + Cy. (10.23) 


Uy eae Uin, eee YUaql ee Uan, %1 
Zi = and y= F 
Vi1 eo Vin eee Val aoe Vang Y2 


(10.1) becomes 


If 


Vig =H A OG + V1 Zij + Y2Vij + Gy. (10.24) 


Finally we consider the case of Example 10.2. With Z and y in Example 10.2, 
Equation (10.1) becomes (10.5). In applications mainly (10.3) and (10.4) are 
used, and these cases are discussed below. 


10.3.1 One Covariable with Constant 7 


In model equation (10.3) one covariable z occurs, and the factor is the same for 
all values z, of this covariable. The BLUE 7 of y is [see (10.10) and (10.18) to 
(10.20)] given by 


1 


ae ye iri Se 22: SP res 


j= ; = (10.25) 


ss SS yes 
Dee es wi? é : 


Formula (10.25) is the estimator of regression coefficients within classes. 
We shall test: 


a) The null hypothesis Ho : y = 0 
b) The null hypothesis Ho : a) = a2 =---=Qq 
c) The null hypothesis Ho : a) + 7Z1 = Q2 + ¥Z2 =++' =a t+ YZa 
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a) To test the hypothesis Ho : y = 0, we use the test (10.16) with M = 1,c = 0; the 
denominator is given by (10.18). If 7 is taken from (10.25) and because 
Z'AZ = SQ,,, formula (10.16) becomes 

_9Z™AZ7(N-a-1)_ SP2.(N-a-1)_, 


= =f. 10.26 
SSresy SSreszSSresy ( ) 


2 
F in (10.26) is F(1,N-a-1,A) with A=" SSje.2. If Ho is true, t= VF is 
oO 
t(N -a-1)-distributed. 


b) To test the hypothesis Hp : a1 =--- = dq, we use a special case of (10.17) with 
r=0,-1 and 
Pat Oxo--0 
1G 2h Se6 
Pel. 4% |. (10.27) 
10 0 :. -1 


L” is a [(a-1) xa] matrix of rank a - 1. Because (wTy)" = (V0 Va-) 
and correspondingly (w'z)" = (Z..,Z1.,...,Z,4.) and by (10.8), we obtain 


at = (9, - YZ 1.5.5Vq.-Y2a-); 


so that (10.17) becomes 


SP. total SP - 
2 SSeoeay SS totalz (ss. =") N-a- 1 


FE 10.28 
en SP°... a-1 ( ) 
BS SSia 
If the null hypothesis Ho:a,;=a)=::-=a, is true, F_ is 
F(a-1,N-a-1)-distributed. 
c) To test the null hypothesis Ho : a1 + 7Z1 = Q2 + YZ. = +++ = Aq + Za, we write Ho 


with L’ from (10.27) as 


Ho: L" (a+ Gy W*Zy) =0q- 


The test statistic of this hypothesis has the numerator SS given in (10.12a), 
and because rk(L7) = a-1, it is equal to 


B. Poe) Ds 
a In; mn N-a-1 


Nj SP*.. a-1 
<s ee 1 yi Pe e iw 


Sresz 


(10.29) 


with SP,,, in (10.20) and SS,¢- in (10.19). 
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Table 10.1 Analysis of covariance table for model equation (10.3) of the simple analysis of 


covariance. 
Source of variation SS df MS 
a Y? ¥Y? SSA 
BoE my OT = M. = 
Factor A Set ao =SS4 a-1 Sa WAL. 
SP2 SP? 
res 1 MS. = res 
z SSresz i SSe., 
: SP2 SSres 
Residual SSres = SSiotai - S84 - Ms N-a-1 MSc = N-a-1 
1 - 
Total yty- ay? sr SS total a 
N 
Table 10.2 SS, df and MS of model (10.1) of analysis of covariance. 
Source of variation SS df MS 
T T i) SS 
Components of a Y'WGwWsyY- rae =SS4 Raa MS, = eq 
r- 
‘ -l1 SS cov 
Covariable YTAZ (ZTAZ) ZTAY = SS coy s MS coy = 
s 
: T T T SSres 
Residual Y°Y-Y°WGwW*Y -SS eo, = SSyes. = N-1-S MS yes = N 
-r-s 
1 
Total y'y- ane = SSiotai N-1 


If Ho:L'(a+GywW'Zy)=0,-1, then F in (10.29) is centrally 
F(a-1,N -a-1)-distributed. 

In the Example 10.3 we demonstrate how ANCOVA tables can be obtained 
by SPSS. 

Table 10.1 is the ANOVA table for model equation (10.3), a special case of 
Table 10.2. 

For our data set we obtain an analogue output. 


10.3.2 A Covariable with Regression Coefficients y; Depending on the 
Levels of the Classification Factor 


Similar to Section 10.3.1 the general formulae for special models may be sim- 
plified. We leave the derivation for the special cases to the reader. 
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For model equation (10.4) the null hypotheses below are of interest: 


Ho 241 =%2 =" =Ya 

Ho 24, =¥%2 =°°'=%q =0 

Ho a =a, = =Aq 
We write 


Ni 


1 i 1 
SSyesz,i = sz a ci SP yes, = SS ziy = met Y;. 
j=l i j=l i 


The components 7; of (71, ..., 72)” with (10.10) are estimated by 


ie Pes . (10.30) 
Because 
4. SP. 

SSres = SSresy ~ Do Sari 

is a quadratic form of rank N -2a, 
a SP...; SP>., 
= dit Sie” SSncz N-2a (10.31) 
SS res a-1 

under 

Ao: =¥2= °° = Ya 


is F(a—1,N -2a)-distributed and can be used as test statistic if 
UE Gleel Salta 

is true. If the hypothesis is not rejected, then 
Ho: a, =QA2=°::'=Mq 


can be tested with the test statistic (10.28), but the two tests are dependent! 


10.3.3. A Numerical Example 


Example 10.3 In Figure 10.1 we show laboratory data from four laboratories 
after (y) and before (z) some treatments have been applied as SPSS file. 
We can either continue with 
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eat ak races es i 
Data Transtorm Analyze Graphs Uniiter sions «= Window = Help, 
, in — 1 Ad mo Sa al > 

Rice HESR ASE BAS Wee ~ 
2 ainesanieaine 


Gael | A before | Pater |v ‘a = “a i 


Dependent Variable 
20 Re 


760 rT} Eined Factor’s) 


Meat) 

Corsa.) 

i 
mai 

_Sesons.._| 


130 99 Random factor(s) 


ea eC OB® MERGE GB SC 


Figure 10.1 Laboratory data after (y) and before (z) some treatments from four laboratories. 
Source: Reproduced with permission of IBM. 


Analyze 
General Linear Model 
Multivariate 
(with before and after as two independent variables) or 


Analyze 
General Linear Model 
Univariate 


as we did in Figure 10.1 and will do in this example. In Figure 10.1 under ‘options’, 
we use those shown in the syntax in Figure 10.2a where the first results are also 
given. Further results are shown in Figure 10.2b and Figure 10.2c. 


In Figure 10.2b we find the estimate of the regression coefficient under 
‘parameter estimates’ in the row ‘before’ as 7 = 1.45663. In Figure 10.2c we find 
the estimated means and pairwise comparisons of the character ‘after’ for the 
four levels of the factor (labour). 

Because the regression coefficients 7; (i = 1, ..., 4) differ significantly from each 
other, we can estimate them via the SPSS syntax: 


UNIANOVA after BY level WITH before 
/METHOD=SSTYPE (1) 
/ INTERCEPT=INCLUDE 
/ PRINT=PARAMETER 
/CRITERIA=ALPHA (.05) 
/DESIGN=level level*before. 
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We obtain then the four estimates of the 7; in Figure 10.3 using (10.30) as 
7, = 1.56313 
¥ = 1.06604 


73 = 0.66509 
and 
4, = 1.54462. 
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Figure 10.2 (a—-c) ANCOVA for the laboratory data in Figure 10.1. Source: Reproduced with 
permission of IBM. 


509 


510 | Mathematical Statistics 


pv [Document2] » IBM SPSS Statictics Viewer - o x 
Transform insert Format Anaiae Graphs Unlities Extensions Window Help 


AG oa abet & 


°9 ‘Ome 
1B @ Univariate Analysis of | 
sae 
iD iD) 
[Active Dataset 7 2 0 70a 
(Lay Between-Sudjects| oo 
(2 Tests of Setween: 
Parameter Espo 4 908 
3 000 
3 T ont 
2 srzae | so2T7 009 115146 
3 159.504" 17542 one 187.093 


Based cn ectmates mangnal means 


sevel 
Significant Oiference (equivalent fo no 


Univariate Tests 


The F tests the et 
independent oaitvas 
mans 


the lineanly 
2 estenalad maiginal 


Figure 10.2 (Continued) 


(Bi Ancova_Tablet04.spy [Document3] «IBM SPSS Statistics Viewer - 6 x 
Bie Ect View Data Transform insert Format Anatze Graphs Lities tensions 


aeSA AD «a PhD Oe 


@ output ] Teste jee 
Bog Depencent Varanie:atter 
"Page TH 


Type! Sum oF 


Log Source Souares: a 
& unnanateanaysis ot] Cerecreamoas | Ouon saan 7 
Tae hiercept a 20450 1 21760 704 | 904190 
Notes 
[Eh acive Osteset weve! 2] tsa704s | $5095 
‘ aseast | 31852 
2 | 2 27059 
Parameter Este 
ae aramater Estenal] | — a 
"Page Tio acted Total 19 
(PiLog 3 RSquarec= 951 (Agjusted R Squared = 938) 
J Vowanate Ananysis of 
+)Twe 
G Notes Parameter Estimates 
Active Oataset Depencent Varsoierte 


Gy Between sutjects Tm Contannce eva 


Poraeneter 6 Sie. Error { Sig. [Tower Bound | Upper 
“4407 007 ~27418 “86439 


eal Wagar 
12461989 3.035 or0 95.187 214.001 
aaesriss | $2802 | 2883 om 21.952 254.790 
fovet=3] 17367573 | 40278] 4.12 on 95.918 261.436 
pvete| 0000 
Hoorn ewe | 1.88313 9493 00 24 1922 
Forete7*betore | 1 06604 210 os? ons 2067 
woi=3{* vetore 68609 400 187 370 1.700 
air before | 1.54462 5.486 00 931 2158 


3 This parereeter 


Dero becaus 


Figure 10.3 Results of ANCOVA for the laboratory data in Figure 10.1. Source: Reproduced 
with permission of IBM. 


10.4 Exercises 


10.1 Give a practical example for model equation (10.24). 


10.2 Derive the test statistic (10.12). 


Analysis of Covariance (ANCOVA) | 511 


References 


Graybill, F.A. (1961) An Introduction to Linear Statistical Methods, McGraw Hill 
Book Comp., New York. 

Scheffé, H. (1953) A method for judging all contrasts in the analysis of variance, 
Biometrika, 40, 87-104. 

Searle, S.R. (1971, 2012) Linear Models, John Wiley & Sons, Inc., New York. 


11 


Multiple Decision Problems 


A multiple decision problem is given if a decision function can take on two or 
more values. A good overview about this is given in Gupta and Huang (1981). 
In this chapter, mainly the case of more than two decisions is discussed, we 
then speak about true multiple decision problems. Two-decision problems 
occur only in special cases (for a =2). Statistical tests as typical statistical 
two-decision problems have a decision function with ‘values’ acceptation of 
Ho and rejection of Ho. 

In this chapter we assume that statements about a > 2 populations (distribu- 
tions) from a set G = {P,,..., P,} of populations must be made. These popula- 
tions correspond with random variables Y, with distribution functions 
Fly, 0;),i=1,...,a, where for each i at least one component 6, of 
or - (Oii,+-+5 ip) €QCR?’,p=1 is unknown. By a real-valued score function 
g*(0;)=g;, the 0; are mapped into R’. For independent random samples 
y= (Yi1-++)in,) With positive integer 1;, decisions about the g* shall be made. 
We already have statistical tests for this. For instance, in simple analysis of 
variance model I with N(y;, 67) normally distributed y, with 07 = (4,67) for 
g*(0;) =g7 =), the null hypothesis 


Ao: fy = Hy = = Hg 


has to be tested against 
H,: At least one pair i, i’ exists with i 47’ (i, i’ =1,..., a), so that uw; Ap; 


by the test statistic 

_ MS, 

7 MS yes 

(Chapter 5). If F(fa, fies | 1 - a) < F with the degrees of freedom f, and f,.. of 


MS, and MS,,;, respectively, Ho is rejected, or otherwise accepted. This is 
an a-test of the solution of a statistical two-decision problems. 


F 
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Real multiple decision problems in this situation would be: 


e Order the g*(0,) according to magnitude. 

e Select the ¢ < a largest (smallest) g*(6;),1<t<a-1. 

e Decide which differences g*(0;)- 9°(0),GiAjsij=1,...,a) are different 
from zero. 

e Decide which differences g*(0,) — g*(0;) , (i= 2,..., a) are different from zero. 


The number of possible decisions differs in the four examples above, but for 
a > 2 it is always larger than two. The number of possible decisions equals 


a 
ct? ol) adie Sapa 
al, ,2 and 2%“, respectively. 
t 


11.1 Selection Procedures 


To define selection procedures we first have to order the populations in G by 
magnitude. For this we need an order relation. 


11.1.1 Basic Ideas 


Definition 11.1 A population P; is considered better than the population 
Pj (j,k=1,...,4,j#h), if g = g* (Ox) > g* (6)) =g;- Pxis considered not worse than 
P; if g 27. 

The values gj,...,g7 can be ordered as the a populations; if &ii) is the ith 
ordered (by magnitude) g* value, then we have ry SB) SSB: 

Next we renumber the populations by permuting the indices 1,...,4. To 
avoid confusions between the original and the permuted indices, we denote 
the populations, the random variables, the parameters and the score functions 


1 2... @ 

afresh. The permutation transforms the population P; with 
(1) (2) - (a) 

its parameter 6; and its random variable y; belonging to g/,, into the population 
Aj, the random variable x; with parameter 7;, respectively. We write further 
& (A) = g(ni) = g; and by this get the rank order 


&() $B(N2) S-* S$ B(Na)s (11.1) 


and A; is not worse than A;, if i> i*. But do not forget that the permutation is 
unknown, and we used it only to simplify writing. 


Definition 11.2 If the set G = {Aj,..., Az} = {Pj, ..., P,} shall be partitioned in 
at least two subsets, so that in one of the subsets, the better elements of G 
following Definition 11.1 are contained, we have a selection problem. 


Selection Procedures 


A decision function (rule) performing such a partition is called selection rule or 
selection procedure. 


Definition 11.3 If the elements of G are fixed (not randomly selected), we call 
this model I of selection. But if the elements have been randomly sampled 
from a larger universe, we call this model II of selection. 


We restrict ourselves in this book to model I of selection. Model II occurs 
mainly in animal and plant breeding and is discussed within population genetics 
(see Rasch and Herrendorfer, 1990). 

The theory of model I is about 65 years old (see Miescke and Rasch, 1996). 

We consider the case that G shall be partitioned exactly into two 
subsets G,; and G2 so that G=G,;UG.,G,;NG2=G, G, = {Gy,...; Ga_t+1} 
and Gp = {Gz_4-.., Gy}. 


Problem 1 (Bechhofer, 1954). 


-1 
a 
For a given risk of wrong decision / with ( ‘ <1-f<1landd>0 from G, 


a subset Mz of size t has to be selected. Selection is made based on random 
samples (xj1,...,in,) from A; with x; distributed components. Select Mz in such 
a way that the probability P(CS) of a correct selection is 


P(CS) = Pc = P(Mg = G,\d(G,,G2) 2d) >1-f. (11.2) 


In (11.2) d(ga_i+182-1) is the distance between A,_;,, and A, _;. 
The distance d(G,, G2) = d( 84-1441 81-1) between G, and Gy equals at least 
to the value d, given in advance. A modified formulation is as follows: 


Problem 1A _ Select a subset Mz of size t corresponding to Problem 1 in sucha 
way that in place of (11.2), 


Pe =P(Mz C Gi) 21-8 (11.3) 
Here Gj is the set in G, containing all A; with g; = g,-+41-d. 
nq -1 
The condition (‘) <1-f above is reasonable, because for 1-/< (") 
no real statistical problem exists. Without experimenting one could then denote 
-1 


-1 
a a 

any of the ( subsets of size t by Mg and would with (‘) >1-f fulfil 
t 


(11.2) and (11.3). 
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Problem 2 (Gupta, 1956). 


-1 
a 

For a given risk # of an incorrect decision with ( <1-f<1, select from 
t 


G a subset Mg of random size r so that 
P{G, Cc Mg}21-f (11.4) 


By this an optimality criterion has to be considered. For instance, we could 
demand that one of the following properties holds: 


e E(r)> Min, 
e E(w)=> Min, where w is the number of wrongly selected populations, 
e The experimental costs are minimised. 


In Problems 1 and 1A, selection is not named incorrect, as long as the distance 
between the worst of the t best populations and any non-best population does 
not exceed a value d fixed in advance. The region [g,_+41- 4, Za -++1] is called 
indifference zone, and Problem 1 often is called indifference zone formulation 
of the selection problem; Problem 2 however is called subset formulation of the 
selection problem. 

One can ask, which of the two problem formulations for practical purposes 
should be used? Often experiments with a technologies, a varieties, medical 
treatments and others have the purpose to select the best of them (i.e. ¢ = 1). 
If we then have a lot of candidates at the beginning (say about a = 500), such 
as in drug screening, then it is reasonable, at first, to reduce the number of can- 
didates by a subset procedure down to let’s say r < 20(or r< 50) and then in a 
second step to use an indifference zone procedure with a =r. 

Before special cases are handled, we would say that in place of Problem 1, Prob- 
lem 1A can always be used. There are advantages in application. The researcher 
could ask what can be said about the probability that we really selected the ¢ best 
populations [(concerning g(7)] if d(gz— +41 81-1) < d. An answer to such a prob- 
lem is not possible, but it is better to formulate Problem 1A, which can better be 
interpreted and where we now know at least with probability 1 - 6 that we elected 
exactly that t populations not being more than d worse than A, _;,1. 

Guiard (1994) could show that the least favourable cases concerning the 
values of P- and Pé for Problem 1 and 1A are identical. By this, the lower bounds 
1 — fin (11.2) and (11.3) are equal (for the same d). We call Pé the probability of 
a d-precise selection. 


11.1.2 Indifference Zone Formulation for Expectations 


In this section we discuss Problem 1A and restrict ourselves on univariate 
random variables y; and «; Further let be gi) =g* (9) =g(m) =E(#i) =m; 


Selection Procedures 


Then AG, Sa 0+ 7) = ee Haat 1 (and d(Ga-1+ 1 Sa-1) =Ma-t+17~-Ma-t in 
Problem 1). For the selection procedure, take from each of a4 populations a ran- 
dom sample (%j1,...,%in,). These random samples are assumed to be stochasti- 
cally independent; the components x; are assumed to be distributed like x;. 
Decisions will be based on estimators of yu. 

Selection Rule 11.1 From the a independent random samples, the sample 
means *},...,%_ are calculated, and then we select the f populations with the 
t largest means into the set Mz (see Bechhofer, 1954). 

Selection Rule 11.1 can be applied if only 1; are an unknown component of 7;. 
If further components of 7; are unknown, we apply a multistage selection pro- 
cedures (see Section 11.1.2.1). 


11.1.2.1 Selection of Populations with Normal Distribution 
We assume that the x; introduced in Section 11.1.2 are N (y;,07)-distributed 
(i.e. p = 2). As mentioned above we renumber the y; so that 

Hi SMa S00 SMa (11.5) 
Let the o? be known and equal to o”. Then we have 


Theorem 11.1 (Bechhofer, 1954). 
Under the assumptions of this section and if n;=n (i=1,...,a), 


Po = P{ max(#1,...,%a—1) < MiIN(Kq-141)--.Ha)}, (11.6) 
and Mg-t+1—Fa-t >d, with d* = dyn 
oO 
Po2t | [D(z + d")|*"[1-B(z)]* *e(z)dz (11.7) 


always holds. If we apply Selection Rule 11.1, Po in (11.7) can be replaced by Pe. 
Proof: Po is smallest if 


By = 00° =Ma-t =Ma-t41 -4 = = Hg -4 (11.8) 
We now consider the t exclusive elements: 
max(H#1.;...)¥a-1) <#y < min (4) (u=a-t+l,...,a). (11.9) 
a-t+ls<vs<a 
Under (11.8) all these events have the same probability so that for 
Ha-t+1-Ha-t>d always Po >t P). With f (%a-141), we write as density function 
of Xa-t+1 
foe} 
Pie [P¢ Le ee eC ee a (CMe eee 


-— co 


517 


518 | Mathematical Statistics 


If ~(z) is as usual the density function of a N(0,1) distribution, then with 
nN _ nN, _ 
A= ee —Hg-) and B= Weta —Ha-t+1) 


we get 
a-t t-1 


oo) A oe) 
eee 
Po2tP, =t | | (ua | o(uau Blog 7dKXq-t41- 
B 


eel 
Because 


nN n 
A B= ies Hg-t) = vg 


we complete the proof by using the distribution function @ of the standard nor- 
mal distribution. 


For the often occurring special case t = 1 formula (11.7) becomes 


Py> | [D(z+d*)|* (z)dz, (11.10) 


and this can be simplified following Theorem 11.2. 


Theorem 11.2 Under the assumptions of Theorem 11.1 with ¢ = 1 and with pn, 
— Wj = dg (j=1,...,a— 1), we receive (without the condition pig 441 -Ha-+>4) 


1 
te cae _ 1 “ah Rly 
Po = P{ max(%1,....¥%a-1,) <q, } aes | ve [e ty 
-Da-1 -Di 
11.11) 
dal nN ( 
with — =D, ty = (t...,ta-1) » R= (o,)and 
SPE Di = (tro staat)" R= (4) 
1, if i=j 
j 1 Lj=l,...,a-1 
Qi Lit tdi (ij ) 


Proof: With z;=%(a),-*(), (i= 1,....4-1), Po becomes 
Po = P{z >0,...,Za-1 > O}. 
Further E(z,) = dy, ;. 
From the independency of the a random samples, 
207 
var(Z;) = sap, 


and for i#j 
re) 
cov (zj,z;) = var (#q) = ae 
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Because xj, are independently N(y;, o”)-distributed, z = (z,, ... )Z, 1) is (a - 1)- 
dimensional normally distributed with 
T o 
E(Z) = (dais....daa-1) =A and var(z)=Z= = la-1 Ty oF geod) 


Therefore Py has the form 
| i | en He-ayPEM(e-A) 
(2x)*"" 0 0 
o a-1 
From Lemma 6.1 it follows j|=a(=) ; and after the substitution 
n 


t=,/n(z-A)/(oV2), we obtain for Py Equation (11.11) with 
R= Ca -1 + 1,-1,4-1)/2. 
But now dga-1 <daa-2 <°*:< dq, and therefore 


> 1 | bone | —3hy Roby 
Poz Jan | e dt, (11.12) 
with z= - /nda,a-1/(oV2), that is, the least favourable case (equality sign in 
(11.12)) is that with dg,¢-1=da,1. 

We now define the /-quantile z(a-1,f) =-z(a-1,1-f) of the (a - 1)- 
dimensional normal distribution with expectation vector 0, _, and covariance 
matrix R by 


za-1,f) 2(a-1,f) 


1 14T p-1 
B= 2 eh Robe (11.13) 
Vant-1 


Putting in (11.12), 


z=2(a-1,f)/V2= -z(a-1,1-f)/V2, 
gives Pp) >1-f. If dzqg-1=d and if we choose n so that 


29 
n> 2 Zz eaaal), 
@ 


then Selection Rule 11.1 for f= 1 at least with probability 1 — £ gives a correct 
selection. Table 11.1 shows the values z(a — 1, 1 - #) for a = 2(1)39. 

Table 11.1 is not needed if we solve (11.14) by R. We use the OPDOE pro- 
gram with 

>size.selection.bechhofer (a=.betas=..delta=..., sigma=-...) 

This program can also be used to calculate the smallest d from n, o”, f and a. 


(11.14) 
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If t > 1, we determine n so that the right-hand side of (11.7) never takes a value 
smaller 1 - f. Table 11.2 gives for some (a, t)-combinations the values ,/nd/o. 
Later with given d, o, the proper 1 can be calculated. 

In the examples the populations are given in their original form P,,... , P,. 


Example 11.1 Select from a = 10 given populations P,,..., Pio the t = 4 with 
the largest expectations! 

Let us assume that we know from experiments in the past that the character 
investigated can be modelled by a N(;, 07)-distributed random variable with 
o = 300. 

How many observations are needed in each of the ten populations to obtain 
for Problems 1 and 1A that Pc >0.95 (P 20.95), respectively, if we choose 
d=22? 

In Table 11.2 we find for 1-6=0.95, a=10, t=4 


d 
J/n— = 3.9184 
oO 
so that 


_ a) a bane 


a 2 | = [9.52] =10. 


By R of course we obtain the same value. We now observe 10 data per popula- 
tion and receive the sample means in Table 11.3. 
Using Selection Rule 11.1 we have to select the populations P , P2 , P3 and Py. 


Bechhofer for Selection Rule 11.1 showed that Pp is a maximal lower bound 
for the probability of a correct selection if we have normal distributions with 
known equal variances and use n; =n for fixed a, t and d. 

If o” is unknown for f = 1, a two-stage selection rule is proposed. 


Selection Rule 11.2 Calculate from observations (realisations of x,) 
xj(i=1,...,4;7=1,...,M9) from the a populations A,,...,A, with 10 <9 < 30 
as in Table 5.2 the estimate Se = MSye, with f= a(no - 1) degrees of freedom. 
For given d and f# = 0.05 , 0.025 or 0.01, respectively, we calculate further with 
the (1 — #)-quantile of the (a-1)—dimensional t-distribution with f degrees of 
freedom t,(a - 1, f,1-—) in Table 11.4 analogously to (11.14) the value 

d 
** ea-Lf.1-B) 
Then we round s}/c up to the next integer (no rounding, if s3 /c is already inte- 
ger) and choose the maximum of my and the rounded value as final sample size. 
If 1 >No, we take from each of a populations 1 — ng additional observations; 


otherwise no is the final sample size. With 1 and no we continue as in Selection 
Rule 11.1 for t=1. 


(11.15) 


Table 11.1 Quantiles z(a— 1, 1 — f) of the (a— 1)-dimensional standardised normal 


distribution with correlation 1%. 


Q 
Oo MON DHT SF WN Fe | 
ey 


PP 
yo FF Oo 


B 

0.01 0.025 0.05 0.10 0.25 

2.326 1.960 1.645 1.282 0.675 
2.558 2.212 1.916 1.577 1.014 
2.685 2.350 2.064 1.735 1.189 
2.772 2.442 2.160 1.838 1.306 
2.837 2.511 2.233 1.916 1.391 
2.889 2.567 2.290 1.978 1.458 
2.933 2.613 2.340 2.029 1.514 
2.970 2.652 2.381 2.072 1.560 
3.002 2.686 2.417 2.109 1.601 
3.031 2.716 2.448 2.142 1.636 
3.057 2.743, 2.477 2.172 1.667 
3.079 2.768 2.502 2.180 1.696 
3.100 2.790 2.525 2.222 1.724 
3.120 2.810 2.546 2.244 1.745 
3.138 2.829 2.565 2.264 1.767 
3.154 2.846 2.583 2.283 1.787 
3.170 2.863 2.600 2.301 1.805 
3.185 2.878 2.616 2.317 1.823 
3.198 2.892 2.631 2.332 1.839 
3.211 2.906 2.645 2.347 1.854 
3.223 2.918 2.658 2.361 1.869 
3.235 2.930 2.671 2.374 1.883 
3.246 2.942 2.683 2.386 1.896 
3.257 2.953 2.694 2.392 1.908 
3.268 2.964. 2.705 2.409 1.920 
3.276 2.973 2.715 2.420 1.931 
3.286 2.983 2.725 2.430 1.942 
3.295 2.993 2.735 2.440 1.953 
3.303 3.001 2.744 2.450 1.963 
3.312 3.010 2.753 2.459 1.972 
3.319 3.018 2.761 2.467 1.982 
3.327 3.026 2.770 2.476 1.990 


(Continued) 


Table 11.1 (Continued) 


B 
a-1 0.01 0.025 0.05 0.10 0.25 
33 3.335 3.034 2.777 2.484 1.999 
34 3.342 3.041 2.785 2.492 2.007 
35 3.349 3.048 2.792 2.500 2.015 
36 3.355 3.055 2.800 2.507 2.023 
37 3.362 3.062 2.807 2.514 2.031 
38 3.368 3.069 2.813 2.521 2.038 
39 3.374 3.075 2.820 2.528 2.045 


Table 11.2 Values vn for the selection of the t best of a populations with normal 
distribution with probability of a correct selection at least equal to 1 — 6 (Bechhofer, 1954). 


a=2 a=3 a=4 = a=5 

1-p t=1 t=1 t=1 t=2 t=1 

0.99 3.2900 3.6173 3.7970 3.9323 3.9196 
0.98 2.9045 3.2533 3.4432 3.5893 3.5722 
0.97 2.6598 3.0232 3.2198 3.3734. 3.3529 
0.96 2.4759 2.8504. 3.0522 3.2117 3.1885 
0.95 2.3262 2.7101 2.9162 3.0808 3.0552 
0.94. 2.1988 2.5909 2.8007 2.9698 2.9419 
0.93 2.0871 2.4865 2.6996 2.8728 2.8428 
0.92 1.9871 2.3931 2.6092 2.7861 2.7542 
0.91 1.8961 2.3082 2.5271 2.7075 2.6737 
0.90 1.8124 2.2302 2.4516 2.6353 2.5997 
0.88 1.6617 2.0899 2.3159 2.5057 2.4668 
0.86 1.5278 1.9655 2.1956 2.3910 2.3489 
0.84 1.4064 1.8527 2.0867 2.2873 2.24.23 
0.82 1.2945 1.7490 1.9865 2.1921 2.1441 
0.80 1.1902 1.6524 1.8932 2.1035 2.0528 
0.75 0.9539 1.4338 1.6822 1.9038 1.8463 
0.70 0.7416 1.2380 1.4933 1.7253 1.6614 
0.65 0.5449 1.0568 1.3186 1.5609 1.4905 
0.60 0.3583 0.8852 1.1532 1.4055 1.3287 


0.55 0.1777 0.7194. 0.9936 1.2559 1.1726 


Table 11.2 


1-B 
0.99 
0.98 
0.97 
0.96 
0.95 
0.94. 
0.93 
0.92 
0.91 
0.90 
0.88 
0.86 
0.84 
0.82 
0.80 
0.75 
0.70 
0.65 
0.60 
0.55 


1-8 
0.99 
0.98 
0.97 
0.96 
0.95 
0.94. 
0.93 
0.92 
0.91 
0.90 


(Continued) 


4.0121 
3.6692 
3.4528 
3.2906 
3.1591 
3.0474 
2.94.96 
2.8623 
2.7829 
2.7100 
2.5789 
2.4627 
2.3576 
2.2609 
2.1709 
1.9674 
1.7852 
1.6168 
1.4575 
1.3037 


t=3 


4.3926 
4.0758 
3.8773 
3.7293 
3.6097 
3.5086 
3.4203 
3.3417 
3.2704 
3.2051 


a=6 

4.2244 
3.8977 
3.6925 
3.5393, 
3.4154 
3.3104 
3.2187 
3.1370 
3.0628 
2.9948 
2.8729 
2.7651 
2.6677 
2.5784 
2.4955 
2.3086 
2.1421 
1.9888 
1.8443 
1.7054. 


4.1475 
3.8107 
3.5982 
3.4390 
3.3099 
3.2002 
3.1043 
3.0186 
2.9407 
2.8691 


a=6 

t=3 

4.2760 
3.9530 
3.7504 
3.5992 
3.4769 
3.3735 
3.2831 
3.2026 
3.1296 
3.0627 
2.9427 
2.8368 
2.7411 
2.6535 
2.5720 
2.3887 
2.2256 
2.0756 
1.9342 
1.7985 


t=2 


4.3858 
4.0669 
3.8668 
3.7175 
3.5968 
3.4946 
3.4054 
3.3258 
3.2537 
3.1876 


4.4807 
4.1683 
3.9728 
3.8270 
3.7093 
3.6097 
3.5229 
3.4456 
3.3755 
3.3113 


(Continued) 


Table 11.2 (Continued) 


a=7 a=7 a=8 a=8 a=8 
1-£ t=2 t=3 t=1 t=2 t=3 
0.88 2.9824 3.0880 2.7406 3.0691 3.1963 
0.86 2.8764 2.9847 2.6266 2.9644 3.0948 
0.84. 2.7806 2.8915 2.5235 2.8698 3.0032 
0.82 2.6929 2.8061 2.4286 2.7832 2.9194 
0.80 2.6113 2.7269 2.3403 2.7027 2.8416 
0.75 2.4277 2.5485 2.1407 2.5215 2.6666 
0.70 2.2641 2.3899 1.9621 2.3601 2.5111 
0.65 2.1137 2.2442 1.7970 2.2116 2.3683 
0.60 1.9719 2.1071 1.6407 2.0718 2.2340 
0.55 1.8355 1.9754. 1.4899 1.9374. 2.1051 
= a=9 a= = a= 

1-Bf = = t=2 t= t= 
0.99 4.5078 4.1999 4.4455 4.5513 4.5950 
0.98 4.1972 3.8653 4.1292 4.2423 4.2888 
0.97 4.0029 3.6543 3.9308 4.0489 4.0974 
0.96 3.8581 3.4961 3.7829 3.9048 3.9548 
0.95 3.7412 3.3679 3.6633 3.7885 3.8398 
0.94. 3.6424 3.2590 3.5620 3.6902 3.7426 
0.93 3.5562 3.1637 3.4736 3.6045 3.6579 
0.92 3.4794 3.0785 3.3948 3.5280 3.5825 
0.91 3.4099 3.0012 3.3234 3.4589 3.5142 
0.90 3.3462 2.9301 3.2579 3.3955 3.4516 
0.88 3.2322 2.8024 3.1405 3.2820 3.3395 
0.86 3.1316 2.6893 3.0368 3.1818 3.2408 
0.84. 3.0408 2.5868 2.9433 3.0915 3.1518 
0.82 2.9577 2.4926 2.8575 3.0088 3.0703 
0.80 2.8807 2.4049 2.7778 2.9321 2.9947 
0.75 2.7074 2.2067 2.5984 2.7596 2.8249 
0.70 2.5535 2.0293 2.4387 2.6064 2.6741 
0.65 2.4122 1.8653 2.2919 2.4658 2.5359 
0.60 2.2794 1.7102 2.1535 2.3335 2.4059 


0.55 2.1520 1.5604. 2.0206 2.2066 2.2814 


Table 11.2 


1-f 
0.99 
0.98 
0.97 
0.96 
0.95 
0.94. 
0.93 
0.92 
0.91 
0.90 
0.88 
0.86 
0.84 
0.82 
0.80 
0.75 
0.70 
0.65 
0.60 
0.55 


1-f 
0.99 
0.98 
0.97 
0.96 
0.95 
0.94. 
0.93 
0.92 
0.91 
0.90 


(Continued) 


a=10 
t=1 

4.2456 
3.9128 
3.7030 
3.5457 
3.4182 
3.3099 
3.2152 
3.1305 
3.0536 
2.9829 
2.8560 
2.7434 
2.6418 
2.5479 
2.4608 
2.2637 
2.0873 
1.9242 
1.7700 
1.6210 


a=11 
t=2 

4.5408 
4.2286 
4.0329 
3.8869 
3.7689 
3.6691 
3.5819 
3.5042 
3.4338 
3.3693 


a=10 
t=2 

4.4964. 
4.1823 
3.9854 
3.8385 
3.7198 
3.6193 
3.5316 
3.4534 
3.3826 
3.3176 
3.2011 
3.0983 
3.0055 
2.9203 
2.8413 
2.6635 
2.5051 
2.3595 
2.2224 
2.0907 


a=11 
t=3 

4.6602 
4.3560 
4.1658 
4.0242 
3.9099 
3.8133 
3.7291 
3.6541 
3.5862 
3.5239 


a=10 
t=3 

4.6100 
4.3037 
4.1120 
3.9693, 
3.8541 
3.7567 
3.6718 
3.5962 
3.5277 
3.4650 
3.3526 
3.2535 
3.1642 
3.0824 
3.0065 
2.8360 
2.6845 
2.5456 
2.4149 
2.2896 


a=11 
t=4 
4.7229 
4.4227 
4.2353 
4.0958 
3.9834 
3.8883, 
3.8055 
3.7318 
3.6652 
3.6041 


a=10 
t=4 

4.6648 
4.3619 
4.1727 
4.0319 
3.9184 
3.8224 
3.7387 
3.6643, 
3.5969 
3.5351 
3.4246 
3.3272 
3.2395 
3.1591 
3.0847 
2.9174 
2.7690 
2.6330 
2.5052 
2.3827 


a=11 
t=5 
4.7506 
4.4522 
4.2660 
4.1274 
4.0158 
3.9214 
3.8392 
3.7661 
3.6999 
3.6393, 


a=10 
t=5 

4.6814 
4.3796 
4.1911 
4.0509 
3.9378 
3.8422 
3.7589 
3.6848 
3.6177 
3.5563, 
3.4463 
3.3494 
3.2621 
3.1822 
3.1082 
2.9419 
2.7944 
2.6592 
2.5322 
2.4106 


a=12 
t=3 

4.7039 
4.4016 
4.2126 
4.0719 
3.9584 
3.8624 
3.7788 
3.7043, 
3.6369 
3.5751 
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Table 11.2 (Continued) 


1-B 


0.88 
0.86 
0.84. 
0.82 
0.80 
0.75 
0.70 
0.65 
0.60 
0.55 


1-B 
0.99 
0.98 
0.97 
0.96 
0.95 
0.94. 
0.93 
0.92 
0.91 
0.90 
0.88 
0.86 
0.84. 
0.82 
0.80 
0.75 
0.70 
0.65 
0.60 
0.55 


a=11 
t=2 


3.2536 
3.1514 
3.0592 
2.9747 
2.8963 
2.7196 
2.5624 
2.4179 
2.2818 
2.1510 


a=12 
t=4 

4.7725 
4.4746 
4.2886 
4.1502 
4.0387 
3.9444 
3.8623 
3.7893 
3.7232 
3.6626 
3.5543 
3.4588 
3.3729 
3.2942 
3.2213 
3.0577 
2.9125 
2.7796 
2.6547 
2.5352 


a=11 
t=3 


3.4126 
3.3143 
3.2258 
3.1447 
3.0695 
2.9006 
2.7505 
2.6129 
2.4835 
2.3594 


a=12 
t=5 

4.8083 
4.5126 
4.3281 
4.1909 
4.0803 
3.9870 
3.9057 
3.8333 
3.7678 
3.7079 
3.6007 
3.5063 
3.4213 
3.3435 
3.2715 
3.1098 
2.9666 
2.8354 
2.7122 
2.5944 


a=11 
t=4 


3.4948 
3.3984. 
3.3117 
3.2323 
3.1587 
2.9934. 
2.8468 
2.7125 
2.5863 
2.4654 


a=13 
t=4 

4.8158 
4.5197 
4.3350 
4.1975 
4.0867 
3.9932 
3.9117 
3.8391 
3.7735 
3.7134 
3.6059 
3.5111 
3.4259 
3.3478 
3.2755 
3.1132 
2.9693 
2.8374 
2.7137 
2.5952 


a=11 
t=5 


3.5309 
3.4354 
3.3494. 
3.2707 
3.1978 
3.0341 
2.8890 
2.7560 
2.6312 
2.5117 


a=13 
t=5 

4.8576 
4.5641 
4.3810 
4.2449 
4.1353 
4.4027 
3.9521 
3.8904 
3.8255 
3.7661 
3.6599 
3.5664 
3.4822 
3.4052 
3.3339 
3.1739 
3.0321 
2.9023 
2.7805 
2.6640 


Table 11.3 Sample means of Example 11.1. 


Population Py Po P3 Py Ps Py Pr Ps Py Pyo 
i. 138.6 132.2 138.4 122.7 130.6 131.0 139.2 131.7 128.0 122.5 


Table 11.4 Quantiles t(a — 1, f, 1 — 6) of the (a — 1)-dimensional t-distribution with correlation 1/2. 


B=0.05 a-1 


10 181 215 234 247 4256 264 2.70 276 2.81 
11 180 213 231 244 253 260 267 4.272 2.77 
12 178 211 229 241 #424250 258 264 269 2.74 
13 177 2.09 227 239 248 255 261 266 271 
14 1.76 208 225 237 4246 2.53 259 264 269 
15 175 207 224 236 244 251 257 262 267 
16 175 206 223 234 243 250 256 261 2.65 
17 174 205 222 233 242 249 254 259 2.64 
18 173 204 221 232 241 248 253 258 262 
19 173 2.03 220 231 240 247 252 257 261 
20 172 203 219 230 239 246 251 256 2.60 
24, 171 201 217 228 236 243 248 253 257 
30 170 199 215 225 233 240 245 250 2.54 
40 168 197 213 223 231 237 242 247 2.51 
60 167 195 210 221 228 235 239 244 2.48 
120 166 193 208 218 2.26 232 237 4241 2.45 
oo 164 192 206 216 2.23 229 234 238 2.42 
fp =0.025 a-1 

f 1 2 3 4 5 6 7 8 9 

5 257 3.03 339 366 388 406 422 436 4.49 


6 2.45 2.86 3.18 3.41 3.60 3.75 3.88 4.00 4.11 


(Continued) 


Table 11.4 (Continued) 


B=0.025 a-1 
f 1 2 3 4 5 6 7 8 9 
236 2.75 304 324 341 3.54 366 376 3.86 
231 267 422.94 313 328 340 351 360 3.68 
9 2.26 261 286 304 318 3.29 339 348 3.55 
10 2.23 257 281 297 311 321 331 339 3.46 
11 2.20 253 2.76 292 305 315 324 331 3.38 
12 218 250 2.72 288 300 310 318 325 3.32 
13 216 248 269 284 2.96 306 314 321 3.27 
14 214 246 267 4281 £293 302 310 317 3.23 
15 213 244 264 279 4290 2.99 307 313 3.19 
16 212 242 263 277 4288 2.96 304 310 3.16 
17 211 241 261 275 285 2.94 301 308 3.13 
18 210 240 259 2.73 284 292 299 305 3.11 
19 209 239 258 2.72 282 2.90 297 304 3.09 
20 2.09 238 257 2.70 281 289 296 302 3.07 
24 2.06 235 2.53 266 276 284 291 2.96 3.01 
30 204 232 250 262 272 2.79 286 291 2.96 
40 2.02 229 247 258 267 42.75 281 286 2.90 
60 2.00 227 243 255 263 2.70 2.76 281 2.85 
120 198 224 240 251 259 266 271 276 2.80 
oo 196 2.21 237 247 255 262 267 271 2.75 
p=0.01 a-1 
f 1 2 3 4 5 6 7 8 9 
5 3.37 3.90 421 443 460 473 485 4.94 5.03 
6 3.14 361 388 407 421 433 443 4.51 4.59 
7 3.00 342 366 383 396 407 415 4.23 4.30 
8 2.90 3.29 351 367 3.79 388 396 4.03 4.09 
9 282 319 340 355 366 375 382 389 3.94 
10 2.76 311 331 345 356 364 371 3.78 3.83 
11 2.72 3.06 325 338 348 356 363 369 3.74 
12 268 301 319 332 342 350 356 362 3.67 
13 265 2.97 315 327 337 344 351 356 3.61 
14 262 2.94 311 323 332 340 346 3.51 3.56 
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Table 11.4 (Continued) 


p=0.01 a-1 

f 1 2 3 4 5 6 7 8 9 

15 260 2.91 308 320 329 336 342 347 3.52 
16 258 288 305 317 326 333 339 344 3.48 
17 257 286 303 314 323 330 336 341 3.45 
18 255 284 301 312 321 327 333 3.38 3.42 
19 254 283 2.99 310 318 325 331 336 3.40 
20 253 281 2.97 308 317 323 329 334 3.38 
24 249 2.77 292 303 311 317 322 327 3.31 
30 246 2.72 287 4297 305 311 316 321 3.24 
40 242 268 282 2.92 299 305 310 314 3.18 
60 239 264 2.78 287 2.94 300 304 308 3.12 
120 236 260 2.73 282 289 294 299 3.03 3.06 
oo 2.33 256 268 277 4284 289 293 297 3.00 


11.1.2.2 Approximate Solutions for Non-normal Distributions and t = 1 

Let the random variables x; be distributed in populations A; with the distribu- 
tion function F(x Hi, Ni2, -+. » Nip). The distribution of the x; may be such that for 
the purposes of a practical investigation, it can adequately be characterised by 
the expectation y; and the standard deviation o(y;), and we have 


F (simian) © G(xi5 Hj O(H;)) 


For a random samples of size n, the sample means #; are approximately nor- 
2 
o (Hi) 


n 
imation is in most cases sufficient for practical purposes. For t= 1 from (11.10), 
by taking into account the variance homogeneity with 


mee) 
a(H) 


we obtain 


mally distributed with expectation yu; and variance . For 1 = 30 the approx- 


i 1 
Po | Bi Wie o(y)dy. (11.16) 
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Table 11.5 Approximative values of d,/n/o(u) for the selection of the population with the 
largest expectations from a populations for the given minimum probability 1 — 6 of a correct 


selection with y = end). 
@) 


1-, 
=0.90 7 
a 0.6 0.7 0.8 0.9 1 1.1 1.2 13 1.4 
2 1495 1564 1641 1.724 1812 1.905 2.002 2.102 2.205 
3 1.770 1.877 1.990 2.108 2.230 2.357 2487 2.620 2.757 
4 1.914 2041 2173 2.310 2452 2.597 2.745 2.896 3.050 
5 2.010 2.150 2.296 2.446 2.600 2.757 2.918 3.081 3.247 
6 2.081 2.231 2.387 2.547 2.710 2.877 3.047 3.219 3.394 
7 2.136 2.295 2.459 2626 2.797 2.971 3.149 3329 3.511 
8 2.182 2.348 2.518 2.692 2.869 3.050 3.233 3419 3.607 
9 2.221 2.393 2.568 2.747 2.930 3.116 3.304 3496 3.689 
10 2.255 2431 2.611 2.796 2.983 3.173 3.366 3.562 3.760 
1-f=0.95 
2 1.918 2.008 2.106 2213 2326 2.445 2.569 2.698 2.830 
3 2.178 2.300 2.430 2.567 3.710 2858 3.011 3.169 3.329 
4 2.315 2.456 2.603 2.757 2.916 3.081 3.249 3422 3.599 
5 2.407 2.560 2.719 2.885 3.055 3.231 3410 3594 3.781 
6 2.475 2.637 2.806 2.980 3.159 3343 3531 3.723 3.918 
7 2.528 2.699 2.875 3.056 3.242 3432 3621 3.825 4.027 
8 2.572 2.749 2.931 3.118 3.310 3.506 3.706 3910 4.117 
9 2.610 2.792 2.979 3.171 3.368 3569 3.774 3.982 4.194 
10 2642 2829 3.021 3.217 3418 3623 3832 4.045 4.260 
1-f=0.99 
2 2.713 2.840 2.979 3.130 3.290 3458 3.634 3816 4.002 
3 2.945 3.097 3.261 3435 3.617 3.808 4.005 4209 4.418 
4 3.070 3.237 3415 3.602 3.797 4.000 4.210 4.426 4.647 
5 3.155 3.332 3.519 3.715 3.920 4131 4.350 4.574 4.804 
6 3.218 3403 3.598 3.801 4.012 4.231 4455 4.686 4.922 
7 3.268 3460 3.660 3.869 4.086 4.310 4540 4.776 5.017 
8 3.309 3.506 3.712 3.926 4.147 4376 4611 4.851 5.096 
9 3.344 3.546 3.756 3.974 4200 4432 4671 4.915 5.164 
10 3.375 3.581 3.795 4.017 4.246 4481 4.723 4.971 5.223 


Selection Procedures 


The selection rule below is a modification from Chambers and Jarratt (1964) of 
Selection Rule 11.2. 


Selection Rule 11.2a Take from each population A; a random sample of size 
No (10 < M9 < 30) and determine the maximum sample mean a and use it as 
estimate of 4. Determine the sample sizes 1 per population witho (X(a)) in place 
of o(u,) so that the integral in (11.16) is not below 1 - f. Then observe (if 1 > m9) 
n — No further values from each population. We then say that the population 
with the largest sample mean calculated with 1 observations is best. 


In Selection Rule 11.2a it was assumed that the function o(j) is known. If x is 
B(n, p)-distributed, we have o(u) = \/u(1-y); if x is P(A)-distributed, we have 
o(u) = /#. If o(u) is unknown, we estimate it by regression of s on x. But for 
non-normal continuous distributions, we also can use the method described 
in Section 11.1.2.1 because it is robust against non-normality as shown in Dom- 
rése and Rasch (1987). 


dyn 
(He) 


11.1.3 Selection of a Subset Containing the Best Population with 
Given Probability 


The values needed in (11.16) can be found in Table 11.5. 


We discuss now Problem 2 of Section 11.1.1 for t=1, Y=, = yand Q = R’. Let 
y; in P; be continuously distributed with distribution function F(y, @) and density 
function fly, @). Let F and f be known but the 6; of the P; be unknown. We 
assume that g*(0) = 0. 

In Problem 2 we have to find a (non-empty) subset (A;,,...,A;,) =Mg of the 
populations A,,...,A, so that the probability of a correct selection P(CS) that 
the best population (with parameter 8(,)) is in the subset is at least 1 — #. Again 
as in Section 11.1.2, we assume 4 < 1-f <1. If for more than one P; the param- 
eter is O(a) =a any of them is called the best. 

The following selection rule (class of selection rules) stems from Gupta and 
Panchapakesan (1970, 1979). 


Selection Rule 11.3 First we select a proper estimator 4 of the unknown 
parameters 7 (and 9). With H(#,n) and h(4,n), we denote the distribution 
function and the density function of #;, respectively. We assume that for 7 
> 1, always H(7,7) < H(i,n') and for at least one 7, we have H(i,n) < H(i,n’'). 

Further let d,,_,(x) be a real differentiable function with parameters u > 1, 
v=0, so that for each x from the domain of definitions Q of H(x, 7), the condi- 
tions below are fulfilled: 


531 


532 


Mathematical Statistics 


-+-d1,0(x) =X, 

---d,,,(x) is continuous in u and v, 
and at least one of the relations 

jim dy,y(x) = co for given u, 


lim d,,,(«) = co for given v and x £0. 


is valid. 
Then Meg contains all populations A;, for which 


duv(i) 2N¢- 


Analogous to (11.10) is by Selection Rule 11.3 


P(CS)= | {ld fone} "hCioma)ah (11.17) 
We put 
| (la Caden} MG na)A =H 044 #1) (11.18) 


so that (11.17) can be written as P(CS) > I(y,, u, v, a). For J in (11.18) it follows 
from the conditions of Selection Rule 11.3: 


, 


1 
I(4,u,v,a) = a 
1410.2) = ~ 
either im I(n,u,v,a) =1 for fixed u (11.19) 
or jim I(n,u,v,a) =1 for fixed v 
(or both). 


From (11.19) it follows that u and v are chosen appropriately so that 
P(CS) > 1-f can be fulfilled for each f. This leads to 


Theorem 11.3 For a continuous random variable 9 with H(j,n) = H(i,n') for 
n<n €Q=R' and t=1, Problem 2 of Section 11.1.1 is solvable with Selection 


1 
Rule 11.3 for all # with F <f<l. 


Gupta and Panchapakesan (1970) proved the theorem below under the 
assumption that (7; < 77) 
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0 . P 0 icc saad ‘ 
Jif lduo( onde (tomy) ~ Sedu) 5 Mio )h | dun (H)ony] 20. (11.20) 


i 


Theorem 11.4 (Gupta and Panchapakesan). 

By Selection Rule 11.3 and with the assumptions of Theorem 11.3 and (11.20), 
the supremum of the expectations E(r) and E(w) is taken for 7, = ---=7,. Here 
w is the number of those A; in Mg obtained by Selection Rule 11.3, not having 
the largest parameter 1,. 


Therefore 7, =---=%q is the least favourable parameter constellation for 
Problem 2. 

We now consider the special case that @ is a location parameter with 
Q=(-00,00). Then essential simplifications appear, because H(i,7) = 
G(#-n) (- 00 << co). Then (11.20) with 


ok Ne dy, (7) 
a) -_ on 


becomes 


di, Aa)h(inn)h |du,v(@)amj| ~h(iam) hla, (mi 2 0. 


If the distribution of # has a monotone likelihood ratio in 4, then (11.20) is ful- 
filled. An appropriate choice of d,,(#) is d(#)=4+d(u=1,v=d) with 
i =X and 7 = py, so that by Selection Rule 11.3, all the A; are put in Mg, for which 

Kj. 2X(q).-d_ (X(q). largest sample mean). (11.21) 


We have to choose d so that 
| TICEe A aC ee) (11.22) 
Another important special case is that 0 is a scale parameter and H(i,7) = 


G (?). Then Q = [0, co) and 7 = 0, and (11.20) with # = s?,4 = 0? becomes 
4 


di, (8°) h(S°.07)h | du (s°),07 | ~duv(s*)h [duv(s*),07] A(s*,0?) 20. 
If the distribution of y has a monotone likelihood ratio and 


oa (s?) > du,y(s”) 20, 


then (11.20) is fulfilled. Therefore 
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dy,»(s°) =us’ (u>1) 


is a possible (and often used) choice of dy, (s°). 


11.1.3.1 Selection of the Normal Distribution with the Largest Expectation 

The most important special case is that « is N(u, 0°)-distributed; o” may be 
unknown. From x observations from each of the A,(i=1,...,a), the sample 
means x;. are calculated. The likelihood ratio of the normal distribution and that 
of the ¢-distribution are monotone for known as well as for unknown o”. There- 
fore a selection rule, ‘Choose for Mg all A;, with x; = x(q) —d’, can be used. x(q) is 
the largest sample mean. We start with the case where o” is known. We put 
d = Do/,/n with a D (analogous to (11.22)) so that 


1-p= | [(u + D)]*"(u)du (11.23) 


where @ and gare the distribution function and the density function of the stan- 
dardised normal distribution, respectively. If o” is unknown, we write approx- 
imately d= Ds/,/n, where s is an estimate of o”, based on f degrees of freedom. 
(11.23) is replaced by 


1-p= | | (oW+Dy))"'owphyy)duay, (11.24) 


where hiy(y) is the density function of G /f and G is CS(f)-distributed. 


In Table 11.2 the values D = d fulfilling (11.23) are given in dependency of a 
and / for t = 1. If the experimenter selected the values d, a and /, then 1 can be 
calculated by (11.14). 

For independent random samples from a populations with normal distribu- 
tion and known variance, Problem 1 for t= 1 is solved by Selection Rule 11.1, 
and Problem 2 is solved by Selection Rule 11.3, leading to the same sample size. 


11.1.3.2 Selection of the Normal Distribution with Smallest Variance 
Let the random variable x in P; be N (u;,07 )-distributed. From n observations 
from each population P,(i = 1, ...,@) with known pi; 
12 2 
Gi = =e (xj —Hi) 


jet 


and with unknown p;, 


Selection Procedures 


are calculated. q; will be used to select the population with the smallest variance; 
each q; has the same number fof degrees of freedom (if w; is known we have f=n, 
and if y; is unknown then f= n - 1). 

We use d,,,,(q) = zq and the selection for the smallest variance follows from 
Selection Rule 11.4. 


Selection Rule 11.4. Put into Mg all A;, for which 
2 
s 
Cres ag) ( =2°21); 
Aa 
Si1) is the smallest sample variance. z* = z(f a, f) depends on the degrees of free- 


dom f, on the number a of populations and on 1 - f. 

For z* we choose the largest number, so that the right-hand side of (11.17) 
equals 1- . We have to calculate P(CS) for the least favourable case given 
by Sia) = SU (monotonicity of the likelihood ratio is given). We denote 


the estimates of o7 as usual by s? and formulate 


Theorem 11.5 Let the y;in a populations be N (1207) -distributed. There may 
be independent estimators s? of o? with f degrees of freedom each. Select from 
the a populations a subset Ng so that it contains the smallest variance o7 at 
least with probability 1-6. Using Selection Rule 11.3 with an appropriate 
chosen z* = z(f, a, f), the probability of a correct selection P(CS) then is 


P(CS) = | [1-G(z*v)]* “gy (v) dv. (11.25) 


In (11.25) Gyand grare the distribution function and the density function of the 
central y*-distribution with f degrees of freedom, respectively. 


Proof: If s? is the estimate of 07, we have (because z* < 1) 
min(s5,...,87) \ 
PULSE MLee 


2 
a OG 
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and further 


* a ee 
P(CS) = | eo) 1-G Z oly dv. (11.26) 
> j=2 ) 
If of = 05 =---=07, then P(CS) is minimum and this completes the proof. 


Table 11.6 shows the values z* = z(f, a, f) that make the right-hand side of 
(11.25) equal to 1-f. Approximately z* =z(f,a,#) can also obtained from 
Table 11.2 using 


Vie =o 1)n=., 


Table 11.6 Values of 10*z= 10*z(f, a, 6), for which the right-hand side of (11.25) equals 
1-f. 


Table 11.6 (Continued) 


36 7972 7074 6640 6363 6164 6011 5887 5784 5696 5619 
38 8021 7142 6715 6444 6248 6098 5976 5874 5788 5712 
40 8067 7205 6786 6519 6327 6178 6058 5958 5873 5799 
42 8109 7264 6852 6590 6400 6254 6136 6038 5952 5880 
44 8149 7319 6914 6656 6470 6326 6209 6112 6029 5957 
46 8186 7371 6973 6718 6534 6393 6278 6182 6100 6029 
48 8221 7420 7028 6777 6596 6456 6343 6248 6167 6097 
50 8254 7466 7080 6832 6654 6516 6404 6311 6231 6162 


2 1111) «0556)=— 0370-0278) 0222) 01850159) 0139) 0123—Ss«O111 
4 2435 1630 81297 1106 0979 0886 0816 0759 0713 0674 
6 3274 2417 2039 1813 1657 = 1541 1450 1377 1315 1263 
8 3862 3002 2610 2370 2202 2076 1976 1894 1826 1766 
10 4306 3457 3062 2818 2645 2515 2410 2325 2252 2190 
12. 4657 =93825 33433) 3188) = 33014 28812775) 2688 )=— 2613) 2549 
14 4944 4132 3744 3501 3327 3194 3087 2999 2924 2859 
16 5186 4392 4011 3770 3597 3464 3358 3270 3194 3129 
18 5394 4618 4243 4004 3833 3702 3596 3508 3433 3368 
20 5575 4816 4447 4112 4043 3913 3808 3720 3646 3581 
22 5734 4992 4629 4397 4230 4101 3997 3911 3837 3772 
24 5876 5149 4792 4564 4399 4272 4169 4083 4010 3946 
26 6004 5291 4940 4715 4553 4427 4325 4240 4168 4104 
28 6119 5420 5076 4854 4693 4569 4468 4384 4312 4250 
30 6225 5539 5199 4981 4822 4700 4600 4517 4446 4384 
32 6322 5648 5314 5098 4942 4820 4722 4640 4570 4508 
34 6411 5749 5419 5207 5052 4933 4836 4754 4684 4624 
36 §=66493 5842-5518) 5308)=— 5156) 55037) 4941 894861 0 = 4792 4732 
38 6570 5929 5609 5402 5252 5135 5040 4960 4892 4833 
40 6642 6011 5695 5491 5342 5227 5133 5054 4987 4928 
42 6709 6087 5776 5574 5427 5313 5220 5142 5076 5017 
44 6772 6159 5852 5653 5508 5394 5303 5226 5160 5102 
46 6831 6227 5924 5727 5583 5472 5381 5304 5239 5182 


(Continued) 
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1-6 =0.90 
48 6887 6291 5992 5797 5655 5544 5454 5379 5314 5258 
50 6940 6352 6056 5863 5723 5614 5525 5450 5386 5330 

1-6 =0.95 
2 0526 0263 0175 0132 0105 0088 0075 0066 0058 0053 
4 1565 1062 0851 0728 0646 0586 0540 0504 0473 0448 
6 2334 1749 1486 1327 1217 1134 1069 1017 0972 0935 
8 2909 2293 2007 1830 1706 1612 1573 1476 1424 1379 
10 3358 2732 2436 2250 2119 2018 1938 1872 1815 1767 
12 3722 3096 2796 2606 2470 2366 2283 2214 2155 2104 
14 4026 3405 3103 2911 2774 2668 2583 2512 2452 2399 
16 4285 3671 3370 3178 3039 2933 2847 2775 2714 2661 
18 4510 3903 3604 3413 3274 3168 3081 3009 2947 2894 
20 4708 4109 3813 3622 3484 3378 3291 3219 3157 3104 
22 4883 4294 4000 3811 3674 3568 3481 3409 3348 3294 
24 5041 4460 4170 3982 3846 3740 3654 3582 3521 3467 
26 5184 4611 4324 4138 4003 3898 3812 3741 3680 3626 
28 5313 4749 4465 4281 4147 4043 3958 3887 3826 3773 
30 5432 4876 4595 4413 4280 4177 4093 4022 3962 3909 
32 5542 4993 4716 4536 4404 4302 4218 4148 4088 4036 
34 5643 5102 4828 4649 4519 4418 4335 4265 4206 4154 
36 5737 +5203 4932 4756 4627 4526 4444 4375 4316 4264 
38 5825 5298 5030 4855 4728 4628 4546 4478 4419 4368 
40 5907 5387 5122 4949 4822 4724 4643 4575 4517 4466 
42 5984 5470 5208 5037 4912 4814 4734 4667 4609 4558 
44 6057 5549 5290 5120 4996 4899 4820 4753 4696 4646 
46 6126 5624 5367 5199 5076 4980 4901 4835 4778 4729 
48 6190 5694 5440 5274 5152 5057 4979 4913 4857 4808 
50 6252 5761 5510 5345 5224 5130 5053 4988 4932 4883 

1-f=0.99 
2 0101 0051 0034 0025 0020 0017 0014 0013 0011 0010 
4 0626 0434 0351 0302 0269 0245 0226 0211 0199 0189 
6 1181 0907 0779 0701 0646 0605 0572 0545 0522 0503 
8 1659 1339 1186 1089 1024 0968 0926 0891 0862 0837 
10 2062 1717 1548 1440 1362 1303 1255 1215 1181 41152 
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Table 11.6 (Continued) 


1-f=0.99 


12 2407 2046) «©1867 1752 1668 1604 1552 1508 1472 = 1439 
14 2704 2334 2149 2029 1942 1874 1820 1774 1734 1700 
16 2966 2590 2401 2278 2188 2118 2061 2014 1973 1937 
18 3197 2819 2627 2501 2410 2338 2280 2232 2190 2153 
20 3404 3025 2831 2704 2612 2539 2480 2431 2388 2351 
22 3591 3212 3017 2890 2796 2723 2663 2613 2570 2532 
24 3761 3382 3188 3060 2966 2892 2832 2782 2738 2700 
26 3916 3539 3344 3216 3122 3048 2988 2937 2894 2855 
28 4059 3684 3490 3362 3268 3194 3133 3082 3038 3000 
30. «©4191 = 33818) 3635) 33497) 33403) 33329) 3268) 3217) 33173) 3135 
32 4314 3943 3750 3623 3529 3455 3395 3344 3300 3261 
34 4428 4060 3868 3741 3648 3574 3513 3462 3418 3380 
36 ©4535 4169 3979 3852 3759 3685 3625 3574 3530 3492 
38 4636 4272 4089 3957 3864 3791 3730 3680 3636 3598 
40 4730 4369 4181 4056 3963 3890 3830 3780 3736 3698 
42 4819 4461 4274 4149 4057 3984 3925 3874 3831 3793 
44 4903 4548 4362 4238 4146 4074 4014 3964 3921 3883 
46 4983 4630 4445 4322 4231 4159 4100 4050 4007 3969 
48 5059 4709 4525 4402 4312 4240 4181 4132 4089 4051 
50 5131 4784 4601 4479 4389 4318 4259 4210 4167 4129 


11.2 Multiple Comparisons 


We know from Chapter 3 that in a statistical test concerning a parameter 0 € Q,a 
null hypothesis Ho : 8 € w is tested against an alternative hypothesis H, : 0 € Q\ a, 
and one has to decide between Hp and H,4. If however the parameter space @ is 
partitioned into more than two disjoint subsets @),...,@,, U;_, @; =@,, we can 
call one of the hypotheses H; : 6 € w; null hypothesis. For instance, we can accept 
one (null) hypothesis (Hy : 6 € @) or reject (Hz: @ € w2) or make no statement 
(Hz: € @3) (@3 is then an indifference zone). 

Real multiple decision problems (with more than two decisions) are 
present. If results of some tests are considered simultaneously, their risks 
must be mutually evaluated. Of course we cannot give a full overview about 
methods available in this field. For more details, see Miller (1981) and 
Hsu (1996). 
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We restrict ourselves in this section on hypotheses about expectations of nor- 
mal distributions. The set of populations G = (P),..., P,) can, for instance, be 
interpreted as levels in a simple analysis of variance model I or as level combi- 
nations in higher classifications. 

Let the random variable ; in P; be N(u;, 0)-distributed. We assume that from 
each populations P;, an independent random sample Y/ = (Vis ¥y) GE lanes) 
of size n; is obtained. Concerning the y; we consider several problems. 


Problem 3 The null hypothesis 
Ao: fy = Hy = °° = Ha 
has to be tested against the alternative hypothesis 
H, : there exists at least one pair (i,j) with i#/ for that yw; Ap; 


with a given first kind risk a,. 


a 
Problem 4 Each of the (") null hypotheses 


Ay: Mi=H; (Afi I=1L...a) 
has to be tested against the alternative hypothesis 


Ayy Hj F My 


a 
where the first kind risk a, is given. Often we choose aj, = a. If we perform (3) 


t-tests, then we speak about the multiple ¢-procedure. 
If for each i #j the null hypothesis Ho, is correct, then Ho in Problem 3 is also 
correct. Therefore one is often interested in the probability 1 — a, that none of the 


a 
(") null hypotheses Ho, is wrongly rejected. We call the a, error probabilities 


per comparison (comparison-wise risk of the first kind) and a, error probability 
per experiment (global error probability or experiment-wise risk of the 
first kind). 


Problem 5 One of the populations (w.l.o.g. P,) is prominent (a standard 

method, a control treatment and so on). Each of the a — 1 null hypotheses 
Ho; {Hi =a (i= 1,...,4-1) 

has to be tested against the alternative hypothesis 


Hai? i F Ma: 
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Again the first kind risk a; is given in advance. Often we choose a; = a. As in 
Problem 4 we like to know the probability 1 - @,, that none of the a -1 null 
hypotheses is wrongly rejected; again we call a, the experiment-wise risk of 
the first kind. 

If we use the term experiment-wise risk of the first kind in Problems 4 and 5, 
we must know that it is no first kind risk of a test. Instead a, is the probability 


a 
that at least one of the (*) and a — 1 null hypotheses, respectively, is wrongly 


rejected. Let us consider all possible pairs (null hypothesis—alternative hypoth- 
esis) of Problem 4 or 5. Then we have a multiple decision problem with more 
than two possible decisions if a > 2. 

In general a, and a cannot be converted into each other. In Table 11.7 it is 
shown how a, increases if the number of pairs of hypotheses k in Problem 4 
or 5 is increasing. For calculating the values of Table 11.7, the asymptotic 
(for known o”) relations for k orthogonal contrast 


ae =1-(1-a)*, (11.97) 
a=1-(1-a,)'"*, (11.28) 


have been used. (11.27) and (11.28) follow from elementary rules of probability 
theory, because we can assign to the independent contrasts independent F-tests 
(transformed z-tests) with f, = 1, / = co degrees of freedom. 


Definition 11.4 A (linear) contrast L, is a linear function 
L.= . ori with the condition) cri = 0. 
Two linear contrasts L,, and L, are called orthogonal if \77_  cuicy; = 0. 


Before we solve Problems 3-5, we first construct confidence intervals for dif- 
ferences of expectation as well as for linear contrasts in these expectations. With 
these confidence intervals, the problems can be handled. 

Most of the tables mentioned below can also be found in Rasch et al. (2008). 


Program Hint 
If data are present in an SPSS file, most of the methods below (and more) can be 
applied by the commands 


Analyze 
Compare Means 
One-Way ANOVA 
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or for higher classifications by 


Analyze 
General Linear Model 
Univariate 


Table 11.7 Asymptotic relation between comparison-wise risk of the first kind (a) and 
experiment-wise risk of the first kind (a) for k orthogonal contrasts. 


10*a, for 10°a for 10*ae for 10°a@ for 
k a=0.05 e= 0.05 k a=0.05 Oe = 0.05 
1 500 5000 15 5367 341 
2 975 2532 20 6415 256 
3 1426 1695 30 7854 171 
4 1855 1274 50 9231 103 
5 2262 1021 80 9835 64 
6 2649 851 100 9941 51 
7 3017 730 200 9999.6 26 
8 3366 639 500 10000 10 
9 3698 568 1000 10000 5 
10 4013 512 5000 10000 1 
12 4596 427 


If we use packages like R, SPSS or SAS, no tables of the quantiles are needed, 
because the packages give the correct significance value. 


11.2.1 Confidence Intervals for All Contrasts: Scheffé’s Method 


As we know from Chapter 5, Problem 3 is solved by the F-test of the one-way 
analysis of variance. 

Using the notation of Chapter 4, Problem 3 is with B" = (ua, ...,@q) and X 
from Example 4.1 a special case of 


Ho: XPE€a@CQ with dim(w)=1, dim(Q)=a, Q=RiXx], 
against 
Ay XP Eo. 


We reformulate Problem 1 as a problem to construct confidence intervals. If Ho 
is correct, then all linear contrasts in the y; = y + a; equal zero. Conversely it fol- 
lows that all linear contrasts vanish, the validity of Hp (see Section 4.1.4). 


Multiple Comparisons 


Therefore confidence intervals K, for all linear contrasts L, can be constructed 
in such a way that the probability that L, € K, for all ris at least 1 - a. We then 
reject Hy with a first kind risk a, if at least one of the K, does not cover L,. 

The method proposed in Scheffé (1953) allows the calculation of simultane- 
ous confidence intervals for all linear contrasts for # in Equation (5.1), lying ina 
subspace w C Q where Q is the rank space R[X] of X in (4.1). The confidence 
coefficient 1 - a, is the probability that all linear contrasts in @ lie in the corre- 
sponding confidence interval. This confidence interval can easily be derived 
from Theorems 4.6 and 4.9 together with Example 4.4. 


Theorem 11.6 We use model I of the analysis of variance in Definition 4.1. 
Further, let k7B(i=1,...,q) with k7 = (Kit,...,kie41) be estimable functions such 
that with the matrix K = (kj,..., k,)' = > Gist & by K ™B =0, a null hypothesis is 
given. For all vectors c € R[K] with rk(K) = g and rk(X) = dim(Q) = p 
{[c"B*-G,c'B’ +G]} (11.29) 
a class of simultaneous confidence intervals for the c’f with confidence 
coefficient 1 - a, is defined, if we put 
G =qF (q,N-p|1-a)s’c" (X7X) c 
with p* from (5.3) and 


g=_y? [Iv-X(X7X) X7]Y. 
n-p 


Proof: We apply Theorem 11.9 and Formula (11.23) and put 0 = Xf (by (5.1)). 
Then with T'X = K’ by 


(Gx Sp XT) PERO) Tr eT Xp) 
<qs°F(q,N-p|1-a.), 
a confidence interval with confidence coefficient 1 - a, for K’f is given. There- 


fore, all (estimable) linear combinations c’ lie with probability 1 - a, in the 
interval given by (11.29). 


Example 11.2 We use Scheffé’s method to test the null hypothesis of 
Problem 3 for the one-way analysis of variance in Example 5.1. We have g =a - 1, 
p=a=rk(X), B" =(u,a%,..., 44), and considering all linear contrasts L, in the 4; 
(11.29) becomes 


a 2 
i,-s\/(@-1)F(a-1,N-ajl—a.),/ 5, 


i=1 


L, +s\/(a-1)F(a-1,N-a|1-a@) ye (11.30) 


i=1 
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and (11.30) contains all L, with }>?_,c,;=0 with probability 1-a,. Here is 


s=./MS,0; (from Table 5.2). 
From Lemma 5.1 it follows that all differences yz; — 4, and the linear contrast in 


a 
p+ a; are estimable. Using (11.30) to construct confidence intervals for all (") 


differences of expectations only, then the confidence interval in (11.30) is too 
large and contains the differences of expectations with a probability >1- a. 
We say in such cases that the confidence intervals and the corresponding tests 
are conservative. 


Example 11.3 We consider a two-way cross-classification with model equa- 
tion (5.13), that is, we assume interactions and put nj =n for all i=1,...,4; 
j=1,...,b. We denote y+ a,(i=1,...,@) as row-means and y+ f; (j = 1,...,d) 
as column ‘means’. If the null hypothesis in Problem 3 has to be tested against 
the corresponding alternative hypothesis for the row-means, we obtain from 
(11.29) the confidence interval with confidence coefficient 1 — a, 


[L,,-A, L, + A] 


1a 11.31 
with A =s,/(a-1)F[a-1,ab(n-1)|1-a,] eazy ( ) 
i=t 


for an arbitrary (but fixed) linear contrast in the row-means: 


Ly = So hi(u + ai). 
i=l 


If 


correspondingly is a linear contrast in the column-means, so it is analogous 
to (11.31), 


[Ls -B, Le + B,] 
(11.32) 


with B=s,\/(b-1)F[b-1,ab(n-1)|1-a¢] 


Multiple Comparisons 


a confidence interval with confidence coefficient 1-a@, for an arbitrary (but 
fixed) linear contrast L,,. 

For estimable functions in w,, we analogously can construct confidence 
intervals. 

In (11.31) and (11.32) s* =MS,,, is given in Table 5.13. From Lemma 5.1 
it follows that all differences and linear contrasts between the row ‘means’ 
or between the column ‘means’ are estimable functions and (11.31) 
and (11.32) can be used. If only differences of the row ‘means’ or 
column ‘means’ are of interest, remarks at the end of Example 11.2 are 
again valid. 

From Theorem 11.6 it follows that by Scheffé’s method, simultaneous confi- 
dence intervals for all linear combinations c’f withce€R[K] can be 
constructed. 

Of course, if only differences in expectations are of interest, confidence 
intervals with Scheffé’s method have a too large expected width, and the power 
of the corresponding tests is too small. In those cases, Scheffé’s method will 
not be applied. To show this, we consider an example for this and competing 
methods. 


Example 11.4 A (pseudo-)random number generator has generated ten 
samples of size five each. The values of the samples 1-8 are realisations of an 
N(50, 64) normally distributed random variables; the two other samples differ 
only in expectations. We have fig = 52 and j119 = 56, respectively. The generated 
samples are shown in Table 11.8. 

Differences between means are given in Table 11.9. 


Table 11.8 Simulated observations of Example 11.4. 


Number of sample 


ya 63.4 49.6 50.3 55.5 62.5 30.7 56.7 64.5 44.4. 55.7 
Vig 46.7 48.4 52.8 36.1 45.8 48.6 46.2 42.2 38.2 64.7 
v3 59.1 49.3 52.5 54.0 52.8 45.8 41.9 49.6 64.8 61.8 
via — 60.7 48.3 58.6 55.9 44.9 44.9 55.8 48.9 43.7 38.9 
Vis 54.9 51.5 48.0 52.9 51.3 52.9 48.9 40.7 61.3 61.8 


y, 56.96 49.42 52.44 50.88 5146 4458 49.90 49.18 5048 56.58 
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Table 11.9 Differences (v.-¥;) between means of Example 11.4. 


1 7.54 4.52 6.08 5.50 12.38 7.06 7.78 6.48 0.38 
2 -3.02 -1.46 -2.04 4.84 -0.48 0.24. -1.06 -7.16 
3 1.56 0.98 7.86 2.54 3.26 1.96 -4.14 
4 -0.58 6.30 0.98 1.70 0.40 -5.70 
5 6.88 1.56 2.28 0.98 -5.12 
6 -5.32 4.60 -5.90 -12.00 
7 0.72 -0.58 -6.68 
8 -1.30 -7.40 
9 -6.10 


The analysis was done with SPSS via 


Analyze 
Compare Means 
One-Way ANOVA 


and define sample as factor and the observations as dependent as in 
Figure 11.1 

If we continue in Figure 11.1 with ok, we obtain the results of a one-way 
analysis of variance with ten samples as factor levels in Figure 11.2. 


tA One-Way ANOVA x 


D. dent List: 
= a 


a oe Factor: 
(ok _}(zaste Reset Cancel) Help ) 


Figure 11.1 Program start of Example 11.4 in SPSS. Source: Reproduced with permission 
of IBM. 


Because F= 1.041 < F(9.40|0.95) (with a@,=0.05), Ho in Problem 11.1 is 


If in Figure 11.1 we press ‘post hoc’, we can select Scheffé’s method in Figure 11.3 


Multiple Comparisons 


Mean square F 
65.061 1.041 
62.520 


accepted. 

(page 542). 

If we continue and press ‘OK’, we obtain Table 11.10. 
: ANOVA 
Observation 
Sum of squares df 

Between groups 585.549 9 

Within groups 2500.784 40 

Total 3086.333 49 


Figure 11.2 Output of ANOVA of Example 11.4. Source: Reproduced with permission of IBM. 


Table 11.10 Confidence intervals by Scheffé’s method of Example 11.4 (shortened). 


Multiple comparisons 
Dependent variable: observation 


Sch 


effe 


426 


(I) (J) Mean difference 
sample sample (I-J) 


1 7.5400 
4.5200 
6.0800 
5.5000 

12.3800 
7.0600 
7.7800 
6.4800 
0.3800 

-7.5400 

—3.0200 

-1.4600 

—2.0400 
4.8400 
—.4800 
0.2400 

—1.0600 

-7.1600 


b 
Oo ON DT SF wWorR DOAN DH FSF Ww WY 


ry 
oO 


Source: Reproduced with permission of IBM. 


Std. 
error 


5.0008 
5.0008 
5.0008 
5.0008 
5.0008 
5.0008 
5.0008 
5.0008 
5.0008 
5.0008 
5.0008 
5.0008 
5.0008 
5.0008 
5.0008 
5.0008 
5.0008 
5.0008 


Sig. 


0.983 
1.000 
0.997 
0.998 
0.721 
0.990 
0.980 
0.994. 
1.000 
0.983 
1.000 
1.000 
1.000 
0.999 
1.000 
1.000 
1.000 
0.988 


95% Confidence interval 


Lower 
bound 


-14.325 
-17.345 
-15.785 
-16.365 

-9.485 
-14.805 
-14.085 
-15.385 
-21.485 
29,405 
—24.885 
—23.325 
-23.905 
-17.025 
—22.345 
—21.625 
—22.925 
-29.025 


Upper 
bound 


29.405 
26.385 
27.945 
27.365 
34.245 
28.925 
29.645 
28.345 
22.245 
14.325 
18.845 
20.405 
19.825 
26.705 
21.385 
22.105 
20.805 
14.705 
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11.2.2. Confidence Intervals for Given Contrasts: Bonferroni’s and 
Dunn’s Method 


Confidence intervals by Scheffé’s method are not appropriate if confidence 
intervals for k special but not for all contrasts are wanted. Sometimes shorter 
intervals can be obtained using the Bonferroni inequality. 


Theorem 11.7 If the k components x; of k-dimensional random variables 
x! = (x),...,%,) with distribution function F(x, ... ,«,) have the same marginal 
distribution functions F(x), then the Bonferroni inequality 


k 
1-F(x1,...%%) =D F(x;)]. (11.33) 


i= 


is valid. 


Proof: Given k events A; ,Az,..., Ax of the probability space (A, B,4,P), that is, 
let A; € B4(i=1,...,k). Then by mathematical induction it follows from inclu- 
sion and exclusion that 


P( Gai) < yr ) 


If A; = {x; < x;}, then because nA Ai = bes Ai a 33) follows. 
If k special linear contrast L, = =)F- 1H; (r=1, ..., k) is given, then the esti- 


mator L,.= Vey; is for each r N(L,, k,o”)-distributed with k, = ey. , Then 


Lpaly 
t, = —— (r=1.,...,k 11.34 
TEE (r= dk) (11.34) 
with s = \/MS,,, are components of a k-dimensional random variable. The mar- 
ginal distributions are central ¢-distributions with v = 5~/_,(”;-1) degrees of 
freedom and the density f(t, v). 

The Bonferroni inequality allows us to find a lower bound of the probability 
that all ¢,-values (r = 1, ..., k) lie between —w and w (w > 0). Due to the symmetry 
of t-distribution and Theorem 11.7, we get 

P=P{-wst,<wlr=1,...,.k}21-2k | f(tv)de. (11.35) 
We select w so that the right-hand side of (11.35) equals (1 -a@,) and obtain 
simultaneous (1 — a@,) confidence intervals for the L,. as 


[i, - wvk;s, Dot wvVk;s| : 


Multiple Comparisons 


Table 11.11 (1—°%)-Quantiles of the central t-distribution with f degrees of freedom. 


5 3.163 3.534 3.810 4.032 4.219 4.382 4.526 4.655 5.773 
6 2.969 3.287 3.521 3.707 3.863 3.997 4.115 4.221 4.317 
7 2.841 3.128 3.335 3.499 3.636 3.753 3.855 3.947 4.029 
8 2.752 3.016 3.206 3.355 3.479 3.584 3.677 3.759 3.832 
9 2.685 2.933 3.111 3.250 3.364 3.462 3.547 3.622 3.690 
10 2.634 2.870 3.038 3.169 3.277 3.368 3.448 3.518 3.581 
11 2.593 2.820 2.981 3.106 3.208 3.295 3.370 3.437 3.497 
12 2.560 2.779 2.934 3.055 3.153 3.236 3.308 3.371 3.428 
15 2.490 2.694 2.837 2.947 3.036 3.112 3.177 3.235 3.286 
20 2.423 2.613 2.744 2.845 2.927 2.996 3.055 3.107 3.153 
30 2.360 2.536 2.657 2.750 2.825 2.887 2.941 2.988 3.030 
40 2.329 2.499 2.616 2.704 2.776 2.836 2.887 2.931 2.971 
50 2.311 2.477 2.591 2.678 2.747 2.805 2.855 2.898 2.937 
60 2.299 2.463 2.575 2.660 2.729 2.786 2.834 2.877 2.915 
80 2.284 2.445 2.555 2.639 2.705 2.761 2.809 2.850 2.887 
100 2.276 2.435 2.544 2.626 2.692 2.747 2.793 2.834 2.871 
foe) 2.241 2.394 2.498 2.579 2.638 2.690 2.734 2.773 2.807 


This means we determine w so that 


Qe 


| Flenae= 3 =a (11.36) 
and the Bonferroni inequality (11.33) has the form 


P>1-a.21-2ka. (11.37) 


For a-=0.05 these w-values w(k,f,0.95) for some k and f are given in 
Table 11.11. 
Dunn (1961) published a table with cases for which his method was better 


a 
than Scheffé’s method. If, among the k contrasts, all : differences of the 


expectations can be found, then Ury and Wiggins (1971, 1974) gave a modifi- 
cation and corresponding tables (but see Rodger, 1973). 
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11.2.3. Confidence Intervals for All Contrasts for n;=n: Tukey's 
Method 


Definition 11.5 If Y= (y4,..., ya) Tisarandom sample with independent com- 
ponents and N(u, o”)-distributed and if vs”/o” is independent of Y CS(v)-distrib- 
uted, we call the random variable 


av = =e 
the studentised range of Y, if w= max;-1,..4(¥;)- minj-1,...2(y;) is the range 
of Y. 
The augmented studentised range is the random variable 


¥ 1 
Gy = —[max(w, maxt{|y,—1]})] 


Tukey’s method is based on the distribution of q,,,. We can show that the 
distribution function of the studentised range q, ,, is given by 


oo) | | 0(2)[(z) - (2 qarx)\* 1x" te" F dade. (11.38) 


In (11.38) x = s/o, y(z) is the density function and &(z) the distribution function 
of the standardised normal distribution. 

We denote by g(a, v|1 - a) the (1 - a)-quantile of the distribution function of 
Qa,v in (11.38), which depends on the number a of components of Y and the 
degrees of freedom of s” in Definition 11.5. 

Tukey’s method (1953) to construct confidence intervals for the differences 
H;—Hy between the expectations of a independent N(u;, o”)-distributed random 
variables y,(i = 1,..., a) is based on the equivalence of the probabilities 


1 
PY (01-91) (u;-Hy)| SK for all i#AKik= Ina} 


and 


1 
{max Hi O4- My) SK (UK = Inova} 


This equivalence is a consequence of the fact that the validity of the inequality in 
the second term is necessary and sufficient for the validity of the inequality in 
the first term. The maximum of a set of random variables is understood as its 
largest-order statistic, and 


max Ly, -Hi- Vn - He) 
i,k 


ik=1,...4 


is the range w of N(0, o”)-distributed random variables; if y; are independent of 
each other, it is N(u;, 0”)-distributed. From this it follows Theorem 11.8. 


Multiple Comparisons 


Theorem 11.8 If y),...,¥, are independently N(y,,07)-distributed 
random variables (i= 1,...,a) with o? = 6” and s*/o” is independent of the ¥; 
(i=1,...,a) CS(f)-distributed, then 


PL(Vi-Iin) — (Hi-Ha) S (Gf |l-ae)s (LK =1...a)}=1-ae. (11.39) 


Therefore by (11.39) a class of simultaneous confidence intervals with confi- 
dence coefficient 1 - a, is given. 
The results of Theorem 11.8 are shown in two examples. 


Example 11.5 Tukey’s method is used to construct confidence intervals for 
differences in expectations and to test the first problem in one-way analysis 
of variance (see Example 5.1). For this we have to assume 11; =N. y,,..,9, are 
the means of the observations y of the a factor levels. For the differences 
H+a;-("+ ay) =a;-ay, simultaneous confidence intervals can be constructed 
with (11.39). Fori=1,...,nitholds var(y,) = oo We estimate o” by MS,., = s” 


in Table 4.2 (n; =n), that is, 


1 a n 1 a 
roa Sal 


i=1j=1 
a(n-1 
Now te?) 


o2 


s’ is CS[a(n - 1)]-distributed and independent of the y; - y;. From 


2 
Theorem 11.8 with f= a(m - 1) and © for o”, we obtain the class of simultaneous 
n 


confidence intervals with confidence coefficient 1 - a, for ;-pj: 


a = Ss _ = Ss 
I; -V.- 7a a(n-1)I1 ol aye Yet Ga A) 


GA#k i,k =1,...,a). (11.40) 


Example 11.6 Analogously to Example 11.3 we consider the two-way cross- 
classification with model equation (5.13) and construct simultaneous confi- 
dence intervals for the differences between the row ‘means’ and column ‘means’ 
introduced in Example 11.3. Again, let 2 = n, for all (i, j). Of course, this is a lim- 
itation of the method. 

For the row ‘means’ we have var(y,.) =0”/(bn) and for the column ‘means’ 


var (7,.) =o*/(an). With s’ from Table 5.13 and f= ab(n - 1) from Theorem 


11.8, it follows the class of simultaneous intervals for p+ a;—- (u + ax) = a; - a 
with confidence coefficient (1 - a,): 


P.M. ~alasab(n 1d a) Far =e lesan). a) 
G#ki,k=1,....4) (11.41) 


551 


552 | Mathematical Statistics 


and analogous to the column ‘means’, the class of confidence intervals 


= = SS _ _ Ss 
7) 0 -albrab(n 1)|1 ay rine Diet MORENO 
(AK j,k=1,..b). (11.42) 


We can show that for any contrast L = }*?_, ci; ¢; real, in generalisation of 
(11.39) with 


b= Soa 
i=1 


the relation 


: 1:4 a 14 
Ina alas cog Srlalstcks Satesil-a),5 kik 


(11.43) 


holds for all L. 
a 
If only the set of . differences in expectations i; — pi; (i Aj; i,j = 1, ..., a) is 


considered, Tukey’s method gives smaller simultaneous confidence intervals as 
the Scheffé’s method. Tukey’s method is then preferable if 1; =n is given. We 
continue with Example 11.4 by using now in Figure 11.3 the button “Tukey’. 
Analogously to Table 11.10 we receive Table 11.12. 


ta One-Way ANOVA: Post Hoc Multiple Comparisons x 


Equal Variances Assumed 


f-] tsp ( S-N-K {-) Waller-Duncan 
("| Bonferroni {| Tukey r 
Sidak Tukey's-b | Dunnett 
iM Scheffe [| Duncan gory . 
[-] R-E-G-W F [| Hochberg’s GT2 BS 
| R-E-G-W Q || Gabriel le 
Equal Variances Not Assumed 
Tamhane'sT2 [f] DunnettsT3 [/] Games-Howell | Dunnett's C 


Significance level: 
(Gontinue |(_Cancel || “Help _) 


Figure 11.3 Post hoc for multiple comparisons of means in SPSS. Source: Reproduced with 
permission of IBM. 


Multiple Comparisons 


Table 11.12 Confidence intervals by Tukey’s method of Example 11.4 (shortened). 


Multiple comparisons 


Dependent variable 


Tukey HSD 


(I) (J) Mean difference 
sample sample (I-J) 


1 7.54000 
4.52000 
6.08000 
5.50000 

12.38000 
7.06000 
7.78000 
6.48000 
0.38000 

—7.54000 

—3.02000 

—1.46000 

—2.04000 
4.84000 

—0.48000 
0.24000 

—1.06000 

—7.16000 


Oo MON DT BP Wor DOAN DH FB WwW Wd 


a 
oS 


Source: Reproduced with permission of IBM. 


Std. 
error 


5.00078 
5.00078 
5.00078 
5.00078 
5.00078 
5.00078 
5.00078 
5.00078 
5.00078 
5.00078 
5.00078 
5.00078 
5.00078 
5.00078 
5.00078 
5.00078 
5.00078 
5.00078 


Sig. 


0.881 
0.995 
0.965 
0.982 
0.312 
0.916 
0.861 
0.949 
1.000 

881 
1.000 
1.000 
1.000 
0.993 
1.000 
1.000 
1.000 
0.910 


95% Confidence interval 


Lower 
bound 


-9.2017 
-12.2217 
-10.6617 
-11.2417 

-4.3617 

-9.6817 

-8.9617 
-10.2617 
-16.3617 
—24.2817 
-19.7617 
-18.2017 
-18.7817 
-11.9017 
-17.2217 
-16.5017 
-17.8017 
—23.9017 


Upper 
bound 


24.2817 
21.2617 
22.8217 
22.2417 
29.1217 
23.8017 
24.5217 
23.2217 
17.1217 

9.2017 
13.7217 
15.2817 
14.7017 
21.5817 
16.2617 
16.9817 
15.6817 

9.5817 


11.2.4 Confidence Intervals for All Contrasts: Generalised Tukey’s 


Method 


Spjetvoll and Stoline (1973) generalised Tukey’s method of Section 11.2.3 


without assuming 7; = 7. 


Definition 11.6 Given a independent N(u, o”)-distributed random variables 
y; (i=1,...,a), let s” be an estimator of o” independent of the y; with v degrees 


of freedom. Then the random variable 
ee ae 


- =q°"(4,v) 


is called the augmented studentised range of y; with v degrees of freedom. 
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Theorem 11.9 (Spjetvoll and Stoline) 
If all conditions of Theorem 11.8 are fulfilled, all linear contrasts L = )7¥_  ciqi; 
simultaneously are covered with probability 1 - a, by intervals 


2-5 Dela sla) 4 sel’ @sl1-a)s. (11.44) 


In (11.44) L=3~*_,cy,, and q‘(a,f\1— a) is the (1 - a@)-quantile of the dis- 
tribution of the augmented studentised range q**(a,f) corresponding to 
Definition 11.6. 


The proof is given in Spjetvoll and Stoline (1973). It is based on the transition 


1 
to random variables x; = —y,, having the same variance and restoring the prob- 
Oj 


lem to that handled in Section 11.2.3. Spjetvoll and Stoline approximate the 
quantiles q*(a, f|1 - a.) by the quantiles g(a, f|1- a.) of the studentised range, 
but Stoline (1978) gave tables of q*(a,f|1 - a). 

The generalised Tukey's method gives as well shorter as also larger 
confidence intervals than the Scheffé’s method, depending on the degree of 
unbalancedness. 

A further generalisation of Tukey’s method can be found in Hochberg (1974) 
and Hochberg and Tamhane (1987). 


Theorem 11.10 Theorem 11.9 is still valid if (11.44) is replaced by 


[sae (oa) esie (Jo) 


(11.45) 


a a 
Here q** | (*) Jl -«] is the quantile of the distribution of q** ( (") Jf in 


Definition 11.6. These quantiles are given in Stoline and Ury (1979). 


11.2.5 Confidence Intervals for the Differences of Treatments with a 
Control: Dunnett’s Method 


Sometimes a — 1 treatments have to be compared with a standard procedure 
called control. 
Then simultaneous 1 — a, confidence intervals for the a — 1 differences 


Hi-Mg (i=1,....4-1) 
shall be constructed. (After renumbering y, is always the expectation of the 
control.) 


Multiple Comparisons 


We consider a independent N(;,07)-distributed random variables y; inde- 
2 


pendent of a CS(f)-distributed random variable us 
o 
Dunnett (1955) derived the distribution of 
(4% teat 2a) 
7 


Dunnett (1964) and Bechhofer and Dunnett (1988) present the quantiles 
d(a -1,f|1 - @.)of the distribution of 


max (19; -Ya - (Hi -Fa) | 


qa isisal 
sV/2 
We see that d< d(a - 1, f|1-a,) is necessary and sufficient for 


1 
——. — — i~Fa <d a-l, 1- e): 
5 fg Ya Hi Ha) <A(a-L fll ae) 
For all i by 
[y;-¥,-d(a-Lf \1-a.)sV2,9;-y, + d(a-Lf|1-a.)sVv2], (11.46) 


a class of confidence intervals is given, covering all differences yu; - 4, with prob- 
ability 1 - a,. 

For the one-way classification with the notation of Example 11.5, we receive 
for equal subclass numbers x the class of confidence intervals: 


2, 2. 
P ¥,.-d(a-1,a(n-1)|1 as", 9, y,. t+ d(a-1,a(n-1)|1 as" 
(i=1,...,a-1). (11.47) 


For the two-way cross-classification (model (5.13)) with the notation of Exam- 
ple 11.6, we receive for equal subclass numbers the class of confidence intervals 
for the row ‘means’: 


2 
. y,.-d(a-1,ab(n-1)|1 ae", ¥;.-¥,. + d(a-1,ab(n-1)|1 a) 
(i=1,...,a-1) (11.48) 
and for the column ‘means’ 
2. 
Ly, ¥,.-d(b-1,ab(n-1)|1 ae", 9, ¥,.+d(b-1,ab(n-1)|1 a) 
(i=1,...,b-1). (11.49) 


We continue with Example 11.4 using now in Figure 11.3 ‘Dunnett’ and receive 
in Table 11.13 analogous to Tables 11.10 and 11.12 taking control as the last 
(10-t) sample. 


555 


556 


Mathematical Statistics 
Table 11.13 Risk of the first kind multiple comparisons. 


Dependent variable: observation 


Dunnett t (2-sided) a 


95% Confidence interval 


(I) (J) Mean difference Std. Lower Upper 

sample sample (I-J) error Sig. bound bound 

1 10 0.38000 5.00078 1.000 -13.6810 14.4410 
2 10 -7.16000 5.00078 0.629 -21.2210 6.9010 
3 10 -4.14000 5.00078 0.964  -18.2010 9.9210 
4 10 -5.70000 5.00078 0.829 -19.7610 8.3610 
5 10 5.12000 5.00078 0.893 -19.1810 8.94.10 
6 10 -12.00000 5.00078 0.125 -26.0610 2.0610 
7 10 -6.68000 5.00078 0.698 -20.7410 7.3810 
8 10 -7.40000 5.00078 0.594  -21.4610 6.6610 
9 10 -6.10000 5.00078 0.778  -20.1610 7.9610 


Dunnett t-tests treat one group as a control and compare all other groups against it 


Source: Reproduced with permission of IBM. 


11.2.6 Multiple Comparisons and Confidence Intervals 


We now discuss the problems given at the start of Section 11.2. Let P; be the a 
levels of the factor in a one-way analysis of variance model I. Independent of 
random samples Y;, we have for the components the model equation 


jp =Hitey (i=1,...45;j=1,...,Ni) (11.50) 


with error terms e; that are N(0, o”)-distributed. 
s’ = MS,,, in Table 4.2 is a of the a sample ‘means’ y;. independent estimator of 
o”. The degrees of freedom of MS,.5 are 


a 


S-(ni-1)=N-a, 


i=1 


1 
and —(N —a)s’ is CS(N -a)-distributed. 
o 


Problem 3 can be handled by the F-test; Ho is rejected, if 


MS 4 


ms? @ 1,N-a|l-a@). (11.51) 


Multiple Comparisons 


Problems 4 and 5 are solved by the methods of construction of confidence 
intervals (see Problem 4 and Problem 5 below). 


a 
If the (°) and a - 1 null hypotheses of Problems 4 and 5, respectively, have 


to be tested in particular and not in total so that for each pair of hypotheses 
(Hoy, Hay) the first kind risk is aj =a, then we use the multiple t-procedure 
or reject Ho,, if 


9-9; am 
til= aa 


>t(N a\1 =): (11.52) 
nj +N; 2 


a 
The risk @ must be understood for each of (*) and a — 1 single comparisons, 


respectively, and we call it therefore a comparison-wise first kind risk. In 
Problem 5 the result is always i=a and j=1,...,a-1; in Problem 4 we have 
iAfi,jel,..,a. 

The minimal size of the experiment N = )°¥_ 1; comes out ifa;=n;i=1,..., 
a. The value m depends on the comparison-wise first kind risk a and on the 
comparison-wise second kind risk / and the effect size 5 analogous to 
Section 3.4.2.1.1 as 


n=  [e(a0-1)]1-$) +H(a(n—1)].-6)] | 


Example 11.7. We plan pairwise comparisons for a = 8 factor levels and use 
a= 0.05; 2 = 0.1;and 6 =o. We start with n = oo degrees of freedom and calculate 
iteratively 


n = [2[t(00 |0.975) + £( co |0.9)|”] = [2(1.96 + 1.2816)”] = 22 


and in the second step 
Ny = [2[t(168|0.975) + t(168|0.9)]”] = [2(1.9748 + 1.2864)7] = 22. 


Therefore n = 22. 

However, if the first kind risk a,; = a shall be chosen so that that the probability 
that at least one of the null hypotheses Ho, is wrongly rejected, we proceed as 
follows. 


Problem4  Ifall 1;=1, we use the Tukey procedure. For all pairs pu; — yj (i Aj; 
i,j=1,...,a), we calculate a confidence interval by (11.40). 

If the corresponding realised confidence interval covers 0, Ho, is rejected. In 
other words we reject Ho, if 
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[re -¥] var 


: q(4,a(n-1)|1—ae) (11.53) 


For unequal n,; we calculate in place of (11.53) a confidence interval with 


My = min( VJ) and f = N-a by 


[9 -a fla) 9-9 94 (afb) (11.54) 
and continue analogously, that is, we reject Ho,, if 

Ii. Yj. My 

——— >4q‘(a, (N-a)|1-a,) (11.55) 
The minimal size of the experiment N = )°/_ ,n; comes out ifm;=n;i=1,...,4. 


With experiment-wise error probability a, and comparison-wise risk of the 
second kind , we receive 


He pete +t(a(n-1)|1 a) | 


1 1 
Alternatively to (11.55) with Rj =, [7 Loe in place of (11.54), we can use 
i J 


a at ee (({ sf |1- «)R (81 Vi. ~ Yj. +q"* ((5) s-) 


(11.56) 


and reject Hoy, if 


-5,| a 
,N-all-a, 11.57 
a >q (8) a «) (11.57) 


(Hochberg procedure). 


Problem 5 If; = 17, we use the Dunnett procedure, based on confidence inter- 
vals of Dunnett’s method. Then Hp; is rejected if 


vaticbel d(a-1,a(n-1)|1-ae) (11.58) 


If m; are not equal, we use a method also proposed by Dunnett (1964) with mod- 
ified quantiles. 


Numerical Example 


To determine the minimal sample sizes in multiple comparisons for the pairs 
of hypotheses (Ho; H4,) and (Ho; H,;), respectively, with a or a,, an upper 
bound fo for the second kind risks £; and £; and |; — y;| > 4; given in advance, 
we use the R-commands in the program OPDOE: 


>size.multiple t.test. for Problem1 ; 
>size.multiple _t.test.comp standard. 


or 
>sizes.dunnett.exp wise. for comparisons with a control. 


11.2.7. Which Multiple Comparison Shall Be Used? 


To answer the question, which of the different multiple comparison procedures 
shall be used, we must at first decide whether Problem 3, 4 or 5 shall be solved 
(all corresponding assumptions for the procedures must be fulfilled). 

Because Problem 3 is a two-decision problem, we use the F-test here; a, and 
1 - f, are probabilities to be understood experiment-wise. 

If Problem 4 is to be solved, we at first have to decide whether the first kind 
risk shall be comparison-wise a for each test separately or whether it has to be 
understood as probability @, (called the experiment-wise first kind risk) that 
none of the null hypotheses Ho; is wrongly rejected. If the first kind risk is com- 
parison-wise, we use the multiple t-procedure or otherwise the Tukey proce- 
dure for n;=n and either the Spjotvoll—Stoline procedure or the Hochberg 
procedure for unequal subclass numbers. Ury (1976) argued to use the 
Spjetvoll—Stoline procedure mainly if small differences between the 1; occur. 
If n; differs strongly he recommends to use the Hochberg procedure. 

For Problem 5 with a comparison-wise a, the multiple t-procedure must be 
used; otherwise the Dunnett procedure is recommended. 

If all or many linear contrasts shall be tested, the Scheffé, the Spjatvoll—Stoline 
or the Hochberg procedure is recommended. Occasionally the Dunn procedure 
leads to useful intervals and tests. The Bonferroni and the Scheffé procedure can 
be used even if the random variables are correlated. 


11.3. A Numerical Example 


We demonstrate the methods of this chapter by Example 11.4 and solve the pro- 
blems below. 

Problem (a) Test the null hypotheses Ho;; against H4;;. For Problem 5 the 
sample 10 corresponds with the control. We use a, = 0.05; a = 0.05. 

Problem (b) Construct simultaneous (1 — 0.05) confidence intervals for the 
contrasts 
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9 
Ly = 99 - Soup Ly = 3h ~My - 3 — Map 
i=l 


Ly = Spy — Sly — 2p, Lg = 25py — 15 pg — 83 — 2g 
and for the differences in expectations p;- (419 (i=4,...,9), denoted by Ls 
bis Lyo. 
Problem (c) Select the population with the largest expectation and select the 
2, 3, 4 and 5 populations with the largest expectations. 
Problem (a). Due to (11.52), Ho;; is rejected, if 


xs 2 
I -Ij| > +(40)0.975) 2 = 10.107, 


where s is obtained from Figure 11.2 as s = /62.52 = 7.907 and from Table D.3 
t(40|0.975) = 2.0211. We see in Table 11.9 that for Problem 4 with the multiple 
t-procedure, the null hypothesis Ho ,¢ is wrongly but Ho6,10 is correctly 
rejected. Among the 43 accepted null hypotheses, 16 have been wrongly 
accepted. The reason for this high percentage is the small subclass number 
to hold a reasonable risk of the second kind. In Section 11.2.6 we presented a 
formula for the needed subclass number, and this gives for a = 0.05, f = 0.2 
and 6 = o the value = 16, but we simulated in Example 11.4 only a subclass 
number n = 5. 

For Problem 5 Hog is wrongly rejected, but the other eight null hypotheses are 
wrongly accepted. Because the subclass numbers are equal, the Tukey proce- 
dure can be applied. 

Because q(10, 40|0.95) = 4.735, all those Ho;; of Problem 4 have to be 
rejected, if 


re. E 7.907 


and this is not the case for any pair (i, /). 
For Problem 5 Ho,(i = 1,..., 9) is rejected, if [¢(9, 400.95) = 2.81] 


4.735 = 16.744 


2-7.907 
i,-F | > 22-7207 9 91 - 14.05 


Vi. ~ Yj. V5 


and neither is this the case for any pair (i, /). 

With a subclass number 5 in Example 11.4, many incorrect decisions have 
been made. The subclass number for the a- and a.-values above are chosen 
so that a difference |; — y;| > 8 with probability of at most / = 0.1 is not detected. 
This is calculated with OPDOE of R. 


Numerical Example 


Problem 3. We receive from the command in Section 5.2.2.2 (delta = 6/o) 


> size.anova (model="a",a=10, alpha=0.05, beta=0.1, 
+delta=1, case="minimin") 

n 

9 

> size.anova (model="a",a=10, alpha=0.05, beta=0.1, 
+delta=1, case="maximin") 

n 

21. 


and choose n = 15. 
Problem 4 Multiple ¢-procedure: The output of R shows = 22 observations 
per population. 
For the Tukey procedure, we receive n = 36. 
Problem 5 Miultiple t-procedure: We obtain 119 = M9 = 45 and n; = 14 (i < 10). 
Dunnett procedure: We obtain 1119 = Ng = 63 and n; = 23 (i < 10). 


Problem (b). The estimates of the contrast we calculate from the means y,. in 
Table 11.8: 


1, =53.92, £,=18.14, £;=31.66, L£,= 161.42. 


Scheffé’s method: Using (11.30), we need for each contrast ,/S~?_, ts p,. 


We obtain 
W, = 4.2426, w2=1.5492, w3=2.7568, w4= 13.5499. 


For all contrasts La, ;=/34;-Mio (i=1,...,6) results, w4,;= 0.6325. From 
F (9, 40|0.95) = 2.1240 results 


s\/(a-1)F(a-1, N -a|0,95) = 34.5709. 
The confidence intervals of the contrast have the bounds L,+D% 
(r = 1,...,10, sstands for Scheffé) with 

DS = 146.67, DS =54.56, DS =95.30, Di = 468.43, 

DS ,,=21.86 (i=1....,6). 


All these confidence intervals cover 0, and therefore none of the hypotheses 
Ho,:L,=0 (r=1,...,10) is rejected. 
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Dunn’s method: Using (11.36), the bounds of the confidence intervals are 
L, +D? with D? = w,-s-w, where w can be found in Table 11.11. Because the 
number of contrasts equals 10, we read for f = 40, w=2.97 Calculating 
confidence intervals for L;, Lz, L3, L4 only, then w = 2.62. For the complete set 
of contrasts, we have 


D? =99.63, D? =36.38, D? =64.74, DP =318.20, 

DP =14.85 ((=1j,:.,6). 
Simultaneous 0.95 confidence intervals for L; bis L4 only give 

DP =87.89, D> =32.09, DP? =57.11, D? =280.70. 
Tukey’s method: Using (11.43), simultaneous 0.95 confidence intervals are 
L, +D? (r=1,...,10) with 


a 


Ss 
D? = —~~q(10, 40|0.95) S$ |ej|. 
2-V/5 ys 


Because q(10, 40|0.95) = 4.735, we receive 


DI =150.69, Dj =50.23, Dj =83.72, Dj = 418.59, 
Die 81674 = Lysine 


Confidence intervals with an individual confidence coefficient 0.95 by the mul- 
tiple t-method of course are shorter but not comparable. 
From Table 11.14 we see that only the method of DUNN is uniformly better 
than the others. 
Problem (c). Using Selection Rule 11.1, then 
Jo). =i. = 56.96 = max (¥;,.). 


1sis10 
We wrongly call population 1 the best one. This happens due to the fact that 
n=5 is too small. Also other rules lead to wrong conclusions. 
Finally we compare the minimal sample sizes of the methods used for 
Example 11.4. 


n for a=0.05, ae =0.05 


Method B=0.05; d=o Remarks 
Selection rule 11.1 (¢ = 1) 12 

Tukey's procedure 40 45 comparisons 
Dunnett s procedure 27 average 9 comparisons 
multiple t procedure 17.1 average a comparisonwise 
F-test 15 1 test 


Average means, for instance, # = (971 + 140). 


Numerical Example 


Table 11.14 Half widths of simultaneous confidence intervals for the contrasts of 
Example 11.4. 


Method Ly Lp L3 Ly Ls y0ae x Lio 
Scheffé 146.67 54.56 95.30 468.43 21.86 
Dunn 99.63 36.38 64.74 318.20 14.85 
Tukey 150.69 50.23 83.72 418.59 16.74 


11.4 Exercises 


11.1 Corresponding to Problem 1 and 1A, calculate the sample sizes of eight 
populations to obtain 


Pc 20.99 (Pc > 0.99) 


d 
for the selection of the t=1,2,3,4 best populations if —=0.1; 
oO 
0.2;0.5and 1. 


11.2 Calculate the minimal experimental size concerning the multiple t-test 
for five groups and the comparison-wise risks @=0.05 and /=0.05; 
0.1 and 0.2 for d=o0 and 6=0.5 0. 


11.3 Calculate the minimal experimental size concerning the Tukey test for 
a=3,4,5,10,20 groups and the experiment-wise risks a@=0.05 and 
0.10 as well as the comparison-wise risks / = 0.05; 0.1 and 0.2 for d=o 
and 6=0.5 0. 
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Experimental Designs 


Experimental designs originated in the early years of the 20th century mainly in 
agricultural field experimentation in connection with open land variety testing. 
The centre was Rothamsted Experimental Station near London, where Sir 
Ronald Aylmer Fisher was the head of the statistical department (since 
1919). There he wrote one of the first books about statistical design of experi- 
ments (Fisher, 1935), a book that was fundamental, and promoted statistical 
technique and application. The mathematical justification of the methods 
was not stressed and proofs were often barely sketched or omitted. In this book 
Fisher also outlined the Lady tasting tea, which is now a famous design of a sta- 
tistical randomised experiment that uses Fisher’s exact test and is the original 
exposition of Fisher’s notion of a null hypothesis. 

Because soil fertility in trial fields varies enormously, a field is partitioned 
into so-called blocks and each block subdivided in plots. It is expected that 
the soil within the blocks is relatively homogeneous so that yield differences 
of the varieties planted at the plots of one block are due only to the varieties 
and not due to soil differences. To ensure homogeneity of soil within blocks, 
the blocks must not be too large. On the other hand, the plots must be large 
enough so that harvesting (mainly with machines) is possible. Consequently, 
only a limited number of plots within the blocks are possible, and only a 
limited number of varieties within the blocks can be tested. If all varieties 
can be tested in each block, we speak about a complete block design. The 
number of varieties is often larger than the number of plots in a block. 
Therefore incomplete block designs were developed, chief among them 
completely balanced incomplete block designs, ensuring that all yield differ- 
ences of varieties can be estimated with equal variance using models of the 
analysis of variance. 

If two disturbing influences occur in two directions (as humidity from east to 
west and soil fertility from north to south), then the so-called row—column 
designs (RCDs) are in use, especially Latin squares (LS). 
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The experimental designs originally developed in agriculture soon were used 
in medicine, in engineering or more generally in all empirical sciences. Varieties 
(v) were generalized to treatments (¢), and plots to experimental units. But even 
today the number v of treatments or the letter y (from yield) in the models of the 
analysis of variance recalls this to the agricultural origin. Sometimes in technical 
books or papers, ¢ is used in place of v. 

Later experimental designs have been handled not only within statistics but also 
within the combinatorics and many were published in statistical journals (as Bio- 
metrika) as well as in combinatorial ones (as Journal of Combinatorial Designs). 


12.1. Introduction 


Experimental designs are an important part in the planning (designing) of 
experiments. The main principles are (the three Rs) 


1) Replication 
2) Randomisation 
3) Reduction of the influence of noisy factors (blocking) 


Statements in the empirical sciences can almost never be derived based on an 
experiment with only one measurement. Because we often use the variance as a 
measure of variability of the observed character and then we need at least two 
observations (replications) to estimate it (in statistics the term replication 
mainly means one measurement; thus, two measurements are two replications 
and not one measurement and one replication). 

Therefore, two replications are the lower bound for the number the replica- 
tions. The sample size (the number the replications) has to be chosen and was 
already discussed at several places in previous chapters. 

Experimental designs are used mainly for the reduction of possible influences 
of known nuisance factors. 

This is the main topic of this chapter, but initially we consider the situation 
where the nuisance factors are not known or not graspable. In this case, we try to 
solve the problem by randomisation here understood as the unrestricted ran- 
dom assignment of the experimental units to the treatments (not vice versa). 
Randomisation is (as shown in Chapter 1) also understood as the random selec- 
tion of experimental units from a universe. But in this chapter in designing 
experiments, we assume that experimental units and blocks are already ran- 
domly selected. 

Randomisation is used to keep the probability of some bias by some unknown 
nuisance factors as small as possible. It shall ensure that statistical models as 
base for planning and analysing represent the situation of an experiment ade- 
quately and the analysis with statistical methods is justified. 


Experimental Designs 


We distinguish between pure and restricted forms of randomisation experi- 
mental designs. We at first assume that the experimental material is unstruc- 
tured, which means there is no blocking. 

This is the simplest form of an experimental design. If in an experimental 
design exactly n; experimental units are randomly allocated to the ith of v treat- 
ments (27; = N), we call this a complete or unrestricted randomisation, and we 
call the experimental design a simple or a completely randomised experimental 
design. Such designs were used in the previous chapters. 

In this chapter we define experimental designs as model independent 
(i.e. independent of the models for the analysis, e.g. for the analysis of 
variance) and consider experiments with N experimental units, numbered 
from 1 to N, and these numbers are used as names of the units. In 
an experiment the effects of p treatment factors A",..., AY have to 
be estimated or tested, and the effects of q nuisance factors B”, 

., B® must be eliminated. The possible values of a factor are called fac- 
tor levels (levels). 

N and p are positive integers, and qg is nonnegative and an integer. An exper- 
iment is always the combination of an experimental design with a rule of 
randomisation. 


Definition 12.1 The assignment of a given number N > 2 of experimental 
(h) 


units to the levels A;’ (i= 1L...,v_, = 1,...,.p) of p 21 treatment factors AM, 


... A® and the levels Be (j=1,...,0c, c= 1,...,q) of g = 0 nuisance factors (block 
factors) B", ..., B® is called a p-factorial experimental design with q block fac- 
tors. If p = 1 the one-factorial experimental design is called a simple experimen- 
tal design. If p > 0 we speak about a factorial experiment. If q = 0 we speak about 
a completely randomised or a simple experimental design. 

Simple experimental designs are, for instance, the base of the methods in 
Chapters 2 and 3, the randomisation in these experimental designs means that 
N experimental units are randomly assigned (e.g. by random number genera- 
tors) to the v level combinations of some treatment factors or the v levels of 
one treatment factor. 

To illustrate the assignment rules of Definition 12.1, we use matrices L/;, and Z, 
combining to the matrix 


Ze Up vellys Zia 2h 2y (12.1) 
The elements of the submatrices LU, and Z, are defined as follows: 


(h) 1, if the /— th experimental unit is assigned to the k - thlevel of A“ 
Uu — 


0, otherwise 
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and 
(c) 1, if the /—- th experimental unit is assigned to the k - th Level of B 
2K = ‘ 
0, otherwise 
We obtain 


UT 1y =r), 1” = (a a? zie = KO, RO" = (?,.rKe). 
(12.2) 


We consider mainly one-factorial experimental designs and write then for 
A(@) =A and for v; =v and further 


r! = po = (ane) = (P.M) 
with r;>landN = ya >vel. 
Example 12.1 We consider the structure of Example 5.12. 


Forage crop 


Green rye Lucerne 

Storage Glass 8.39 9.44. 
7.68 10.12 

9.46 8.79 

8.12 8.89 

Sack 5.42 5.56 

6.21 4.78 

4.98 6.18 

6.04. 5.91 


Here is N = 16, q=0 and p = 2. In the first column are the elements 1-8 and in the 
second column, the elements 9-16 (numbered from above). 
The factors are AY and A” and further 


i= 


i 11111131100000000 
0o000000011111111 


and 


ofl LAE CORO ah OOOO 
29 MOO. AO! Ae Ae OOO, DET A 


Block Designs 
Besides v, =2, v2 = 2 and ph)? _ (8, 8) withh = 1,2. 


Definition 12.2 A one-factorial experimental design is K-balanced of order tf, 
if a given operator K transfers the matrix Z = (Uj, Z,) in (12.1) into a (vxv)- 
matrix with identical elements in the main diagonal and exactly t different ele- 
ments outside the main diagonal. 


12.2 Block Designs 


Block designs are experimental designs to eliminate one disturbance variable. In 
case of a quantitative nuisance factor, we also can use the analysis of covariance 
if the type of the dependency and the underlying function are known (for 
instance linear or quadratic). The parameters are estimated from the observed 
values of the character and the nuisance factor as already shown in Chapter 10. 
A general (i.e. also for qualitative nuisance factors) applicable method is block- 
ing or stratification by the levels of the nuisance factor. We restrict ourselves to 
one treatment factor. This is no loss of generality. If we have several treatment 
factors, we consider all level combinations of these treatment factors as treat- 
ments of some new factor. 

As already said, a block design helps to eliminate the effects of a disturbance 
variable, that is, the matrix Z in (12.1) contains just one matrix Z, and is of the 
form Z = (Ui, Up; Z1) = (Zo; Zi) with Zo = ties Up). We form 


oT 7- Zi Dy Ze F, 
ZiT Zia 


Zj Zo is a diagonal matrix and is for one treatment factor of the form 
U}U, =diag(r,...,r); Z)Z, =diag(ky,....k,) is also a diagonal matrix. Now 
we have 


ore uUpU, U{Z,\ _ f diag(r,...7) UTZ; 
Zh ZZ, ZU, diag(ki,...k») ) 


The submatrix WU} Z, = N is called incidence matrix. By this Z’Z has the form 
diag(r},...,7y N 
ztz- (“sl - ) 
N diag(ky,...,kp) 


A block design has therefore a finite incidence structure built up from an inci- 
dence matrix. A finite set {1, 2, ..., v} of v elements (treatments) and a finite set 
{B,, Bo, ..., By} of b sets, called blocks, are the levels of the nuisance and block 
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factor. The elements of the incidence matrix VV = (nj) lead to the two diagonal 


matrices because N1) = diag(r},...,7)) and N‘ 1, = diag(ky,...,kp). 


Definition 12.3. Theelements ofthe incidence matrix V = (nj) with v rows and 
b columns show how often the ith treatment (representing the ith row) occurs 
in the jth block (representing the jth column). If all 1; are either 0 or 1, the 
incidence matrix and the corresponding block design are called binary. The 
b column sums k; of the incidence matrix are the elements of diag(k;, ..., ky) 
and are called block sizes. The v row sums r; of the incidence matrix are the 
elements of diag(r,, ...,7,) and are called replications. A block design is com- 
plete, if the elements of the incidence matrix are all positive (1 = 1). A block 
design is incomplete, if the incidence matrix contains at least one zero. Blocks 
are called incomplete, if in the corresponding column of the incidence matrix, 
there is at least one zero. 

In block designs the randomisation has to be done as follows: the experimen- 
tal units in each block are randomly assigned to the treatments, occurring in this 
block. This randomisation is done separately for each block. In the complete 
randomisation in Section 12.1, we only have to replace N by the block size k 
and v by the number of plots in the block. 

In complete block designs with v plots per block, where each of them is 
assigned to exactly one of the v treatments, the randomisation is completed. 
If k < v, (incomplete block designs) the abstract blocks, obtained by the math- 
ematical construction have to be randomly assigned to the real blocks using the 
method in Section 12.1 for N = J, if b is the number of blocks. 

For incomplete binary block designs in place of the incidence matrix often a 
shorter writing is in use. Each block is represented by a bracket including the 
symbols (numbers) of the treatments, contained in the block. 


Example 12.2 A block design with v = 4 treatments and b = 6 blocks may have 
the incidence matrix: 

101000 

010111 

101000 

010111 


Because zeros occur, we have an incomplete block design. This can now be writ- 
ten as 


{(1,3), (2,4), (1,3), (24), (2-4), (2,4)}. 


The first bracket represents block 1 with treatments 1 and 3 corresponding to the 
fact that in column 1 (representing the first block) in row 1 and 3 occurs a one. 


Block Designs 


Definition 12.4 (Tocher, 1952) 
A block design for which in Definition 12.2 the operator K maps the matrix Z 


into the matrix, NN" is called NN“ -balanced. A NN '-balanced incomplete 
block design of order ¢ is called partially balanced with ¢ association classes. 


Definition 12.5 A block design with a symmetric incidence matrix is a symmet- 
ric block design. If all treatments in a block design occur equally often (the number 
of replications is r; = r for all i), it is called equireplicate. If the number of plots in a 
block design is the same in each block (4; = k, for all j), it is called proper. 

It can easily be seen that the sum of the replications r; as well as the sum of all 
block sizes k; equals the number N of the experimental units of a block design. 
Therefore, for each block design we have 


Vv b 
+ => BEN, (12.3) 
i=1 j=l 


especially for equireplicate and proper block designs (r; = r and k; = k); this gives 
vr = bk. (12.4) 


In symmetric block designs are b= v and r;=k; (i = 1,..., v). 


Definition 12.6 

a) An incomplete block design is connected, if for each pair (A, A,) of treat- 
ments Aj, ..., A,, there exists a chain of treatments starting with A; and end- 
ing with A, in which each of two adjacent treatments in this chain occur in at 
least one block. Otherwise, the block design is disconnected. 

b) Alternatively we say: a block design with incidence matrix V is disconnected 
if, by permuting its rows and columns in a suitable way, VV can be trans- 
formed into a matrix M that can be written as the direct sum of at least 
two matrices. Otherwise, it is connected. 


Both parts of Definition 12.6 are equivalent; the proof is left to the reader. 

Both parts of this definition are very abstract and their meaning is perhaps 
unclear. But the feature ‘connected’ is very important for the analysis. Discon- 
nected block designs, for instance, cannot be analyzed as a whole by the analysis 
of variance (side conditions not fulfilled), but rather as two or more independent 
experimental designs. 


Example 12.2 - Continued In the design of Example 12.2, the first and the 
second treatments occur together in none of the six blocks. There is no chain of 
treatments as in Definition 12.6, and therefore the design is a disconnected 
block design. This can be seen if the blocks and the treatments are renumbered, 
which means that the columns and the rows of the incidence matrix are properly 


573 


574 


Mathematical Statistics 


interchanged. We interchange the blocks 2 and 3 and the treatments 1 and 4. 
Thus, in the incidence matrix the columns 2 and 3 and the rows 1 and 4 are 
interchanged. The result is the matrix: 


001111 
001111 
110000 
110000 


And this is the direct sum of two matrices, and that means that we have two 
designs with two separate subsets of treatments. In the first design there are 
two treatments (1 and 2) in four blocks, while in the second design there are 
two further treatments (3 and 4) in two other blocks. 

In the rest of these chapters, we only consider complete (and by this by def- 
inition connected) or connected incomplete block designs. Further, we restrict 
ourselves to proper and equireplicate block designs. 


Definition 12.7 Let N;;i= 1,2 be the incidence matrices of two block designs 
with the parameters v,, b, k;, r;. The Kronecker product NV = N1@N>% is the inci- 
dence matrix of a Kronecker product design with the  para- 
meters v= V1V2,b =bibo,k =kyko,r = rrp. 


Theorem 12.1 If the A’;,i=1,2 in Definition 12.7 are binary, then the Kro- 
necker product design with the incidence matrix N =N1@N>% is also binary, 
and we have VN? =NiN | QNoN . . If the Kronecker product designs 
with the incidence matrices N;;i= 1,2 are both N,N : -balanced of order t;, so 
are N=N @N> and N* =N2@N;, incidence matrices of NN" -balanced 
and "NV" -balanced block designs, respectively, of order t* < (t; + 1)(t2 + 1)-1. 


Proof: The first part of the theorem is a consequence of the definition of Kro- 


necker product designs. Because the design is VsN i -balanced of order t;, these 
matrices have exactly ¢; + 1 (with the main diagonal element) different elements. 


Because WV" =NIN1@NNS in NN? (or N*N*2) all (t, + 1)(t2 +1) pro- 
ducts will be found. But all elements in the main diagonal of NN’ are equal, 
so that maximal (¢, + 1)(t) + 1)-1 different values in NV" (or N“N“*) exist. 


12.2.1 Completely Balanced Incomplete Block Designs (BIBD) 


Definition 12.8 A (completely) balanced incomplete block design (BIBD) is a 
proper and equireplicate incomplete blocks design with the additional property 
that each pair of treatments occurs in equally many, say, in A, blocks. Following 


Block Designs 


Definition 12.4 it is a NN" -balanced incomplete Block design with t = 1, a 
BIBD with v treatments with r replications in b Blocks of size k < v, is called 
a Biv, k, A)-design. A BIBD for a pair (v, k) is called elementary, if it cannot 
be decomposed in at least two BIBD for this pair (v, k). A BIBD for a pair (v, k) 
is a smallest BIBD for this pair (v, 4), if r (and by this also b and 4) is minimal. 


In Bi, k, 4) only three of the five parameters v, b, k, r, 1 of a BIBD occur. But 
this is sufficient, because exactly three of the five parameters can be fixed, while 
the two others are automatically fixed. 


This can be seen as follows. The number of possible pairs of treatments in the 


-1 
design is Ot vv. ) However, in each of the b blocks exactly 


k k(k-1)_, . 
3 = ——7— Pairs of treatments exist so that 


Av(v-1) = bk(k-1) 


v 

if each of the 3 pairs of treatments occurs 4 -times in the experiment. From 

Formula (12.4) we replace bk with vr and after division by v we obtain 
A(v-1) = r(k-1). (12.5) 


The Equations (12.4) and (12.5) are necessary conditions for the existence of a 
BIBD. These necessary conditions reduce the set of possible quintuple of inte- 
gers v, b, r, k, A ona subset of integers, for which the conditions (12.4) and (12.5) 
are fulfilled. If we characterize a BIBD by three of these parameters, like {v, k, J}, 
the other parameters can be calculated via (12.4) and (12.5). 

The necessary conditions are not always sufficient for the existence of a BIBD. 
To show this we give a counter example. 


Example 12.3. We show that the conditions that are necessary for the exist- 
ence of a BIBD must not be sufficient. The values 


v=16,r=3,b=8,k=6,A=1 


give 16-3 = 8-6 and 1-15 = 3-5 by (12.4) and (12.5), but no BIBD with these para- 
meters exists. 

Besides (12.4) and (12.5) there is a further necessary condition, Fisher’s 
inequality 


bv. (12.6) 
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This inequality is not fulfilled in Example 12.3. But even if (12.4), (12.5) and 
(12.6) are valid, a BIBD does not always exist. Examples for that are 


v=22,k =8, b=33,r=12,1=4 


and 


v=34,k=12,r=12, b=34,1=4. 


The smallest existing BIBD for 
v=22,k=8 and v=34 and k=12 
has the parameters 


v = 22,k = 8, b = 66, r =24, A=8 and 
v = 34,k=12,r=18,b =51, A = 6, respectively. 


A BIBD (a so-called unreduced or trivial BIBD) for any positive integer v and 
k < v can always be constructed by writing down all possible k-tuples from v 


Vv v-l v-2 
elements. Then b = t= and J = ‘ 

k k-1 k-2 

Often a smaller BIBD (with fewer blocks than the trivial one) can be found as a 

subset of a trivial BIBD. A case where such a reduction is not possible, is that 
with v = 8 and k = 3. This is the only one case for v < 25 and2 <k<v-1whereno 
smaller BIBD than the trivial one exists. Rasch et al. (2016) formulate and sup- 
port the following conjecture. 


Conjecture: 
The cases v = 8 and k = 3 are the only cases for k > 2 and k < v — 2 where the 
trivial BIBD is elementary. 


This conjecture is even now neither confirmed nor disproved. But the follow- 
ing theorem is proved. 


Theorem 12.2 The conjecture above is true, if at least one of the following 
conditions is fulfilled: 


a) v<26,2<k<v-1 
b) k<6 
c) for v > 8 and ka BIBD exists with b= v(v-1). 


Proof: For (a) and (b) the theorem is proved constructively that for all parameter 
combinations there exists a non-trivial BIBD. 
If there exists a BIBD with b = v(v — 1), then for each k < v/2 we write 


(() $e (V-k+)) _y, 1 (2-2) (v-3) (v-k+1) 


k}/ 1 2 3" Ok 6 4  —k 
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(v-2) (v-3) (v-k+1) 
aR 
This completes the proof. 


All factors of 


v 
are larger than 1, so that v(v- 1) < ()) . 


The block designs with b = v(v - 1) blocks often are not the smallest. One 
reason is that in some designs each block occurs w times. Removing w — 1 copies 
-1 
of each block leads to a BIBD with Wr) blocks. 
w 
In the meantime F. Teuscher (2017, Constructing a BIBD with v = 26, k = 11, 
b = 130, r = 55, A = 22, personal communication) showed that the conjecture 
is true for v = 26, k = 11, because he constructed a design with v = 26, 
k = 11, b = 130, r = 55, 4 = 22. 


v 
In constructing BIBD we can restrict ourselves to k< > due to the defini- 


tion below. 


Definition 12.9 A complementary block design for a given BIBD for a pair 
(v,k) is a block design for (v,v — k) with the same number of blocks, so that each 
block of the complementary block design contains just those treatments not 
occurring in the corresponding block of the original BIBD. 

We receive (parameters of the complementary design are indicated with *) 


v=v,b*=b,k*=v—k,r=b-r. 


The incidence matrix of the complementary design is N” = 1,y-/N, and this 
adds up to 


NNT = (lw-N)(ywp-N)* =bly-rly- rly + NNT 
=(r-A)l, + (b-2r+A)1y. 


That means that the complementary block design of a BIBD is also a BIBD, 
with 1* =b-2r+A 


Theorem 12.3 The complementary block design to a given BIBD for a pair 
(v,k) is a BIBD for (v,v — k) with the parameters v*, b*, k*, r*, 4* and v* = v, 
b’=b, kk =v—-kr =b-4r,d =b-2r+d. 


From this it follows that a BIBD cannot be complementary to a block design that 
is not a BIBD. 

Of course smallest (v,k) — BIBD are elementary, but not all elementary BIBD 
are smallest, as we will show in Example 12.4. 

In applications the number v of treatments and the block size k are often 
given, and we like to find the smallest BIBD for a pair (v,k). This is possible with 
the R-programme in OPDOE (Rasch et al., 2011) for v < 25. 

If k= 1 each of the v elements define a block of a degenerated BIBD with v = b, 
r=1and4 = 0. These BIBD are trivial and elementary. The same is true for its 
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complementary BIBD with v = b, r=k=v-—-1andd=v — 2. Here in each block 
another treatment is missing. Even for k=2 all BIBD and their complementary 
BIBD are trivial as well as elementary. That is why in the future, we will confine 
ourselves to 3 < k < v/2. 


Definition 12.10 A BIBD is said to be a- resolvable or a-RBIBD, if its blocks 
can be arranged to form / = 2 classes in such a way that each block occurs exactly 
a times in each class. We write RB(v, k, A). A a-RBIBD is said to be affine a - 
resolvable, if each pair of blocks from a given class has exactly a = q, treatments 
in common and pairs of blocks from different classes have q2 treatments in com- 


mon, a 1-resolvable BIBD is called resolvable or a RBIBD. 
2 


k 
For affine a- resolvable RBIBD we have b=v+r-—landa= ~ 


Example 12.4 The BIBD with v = 9, k = 3, A = 1 and b = 12 is affine 
1-resolvable in four classes (the columns of the scheme) 


(1,2,3) (1,4,7) (1,5,9) (1,68) 


(4,5,6) (2,5,8) (2,6,7) (3,5,7) 
(7,8,9) (3,6,9) (3,4,8) (2,4,9) 


2 
because a= re 1. 


Definition 12.11 If VV is the incidence matrix of a BIBD, then NV” is the inci- 
dence matrix of the dual BIBD, obtained by interchanging rows and columns in 
the incidence matrix of a BIBD. 

The parameters v*, b*, r*, k* and 4* of the dual BIBD of a BIBD with para- 
meters v, b, r,k, andd are v* = b, b* =v,r° =k, kk’ =randd =i". 


Example 12.5 For v = 7 and k = 3 the trivial BIBD is given by the following: 


(1,2,3) (1,3,6) (1,6,7) (2,47) (3,5,6) 
(1,2,4) (1,3,7) (2,3,4) (2,5,6) (3,5,7) 
(1,2,5) (1,4,5) (2,3,5) (2,5,7) (3,6,7) 
(1,2,6) (1,4,6) (2,3,6) (2,6,7) (4,5,6) 
(1,2,7) (1,4,7) (2,3,7) (3,45) (45,7) 
(1,3,4) (15,6) (2,4,5) (3,4,6) (4,6,7) 
(1,3,5) (1,5,7) (2,46) (3,4,7) (5,6,7) 
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A = 1 and the blocks 


An elementary BIBD has the parameters b = 7, 3, 
4,6)} — in bold print above. 


r 
{(1,2,4); (1,3,7), (1,5,6), (2,3,5), (2,6,7), (4,5,7), (3 
The incidence matrix is 


a 


1110000 
1001100 
0101001 
1000011 
0011010 
0010101 
0100110 


The complementary BIBD is 
{(1,2,3,6), (1,3,4,5), (1,4,6,7), (1,2,5,7), (2,4,5,6), (2,3,4,7), (3,5,6,7)}. 


A further elementary BIBD with parameters b = 7, r = 3, A = 1 is the septuplet 
printed italic (but not bold) in the trivial BIBD. It is isomorph to the BIBD with 
the italic and bold printed blocks. The set of the residual 21 of the 35 blocks 
cannot be split up into smaller BIBD; they also build an elementary BIBD, 
but of course not the smallest. 

To show that there are no further BIBD with 7 blocks (and by thus no BIBD 
with 14 blocks) within the residual 21 blocks, we consider one of the 21 residual 
blocks, namely, (1,2,3). Because r = 3 we need two further blocks with a 1, where 
(1,4), (1,5), (1,6) and (1,7) are contained. The only possibility is (1,4,5) and 
(1,6,7), and other possibilities are already in a block of the two elementary 
designs or contradict A = 1. The block design we are looking for must start with 
(1,2,3), (1,4,5) and (1,6,7). Now we need two further blocks with a 2 with the 
pairs (2,4), (2,5), 2,6), and (2,7). Possibilities are (2,4,6) with (2,5,7) or (2,4,7) with 
(2,5,6). 

It means we have two possibilities for the first five blocks: 


(1,2,3) or (1,2,3) 
(1,4,5) (1,4,5) 
(1,6,7) (1,6,7) 
(2,4,6) (2,4,7) 
(2,5,7) (2,5,6) 


Now we need two blocks with a 3 in each to add them to the five blocks. The 
blocks (3,6,7) and (3,4,5) are not permissible; the pairs 4,5 and 6,7 are already 
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present in the first five blocks, (3,4,7) is in the first quintuple permissible, but the 
needed partner (3,5,6) is already in a block of the two elementary designs. 
Therefore, the first quintuple must be withdrawn. In the second quintuple 
we could continue with (3, 5, 7), but here the needed partner (3, 4, 6) is also gone. 
Therefore the remaining 21 blocks build an elementary BIBD. 

The dual BIBD of the bold printed elementary design above has the incidence 
matrix 


> 2 ee 
oF FF Oo CO Fe 
re COC OF CO Ff CO 
PO oO VO Or 
oF OF F CO Oo 
ororF Oo © 
ep fF OC CO KF CO 


0011 10 


The corresponding BIBD is 
{(1,2,3); (1,4,5), (1,6,7), (2,4,7), (2,5,6), (3,4,6) (3,5,7)} 


and of course elementary as well. 
In the following we give some results where the necessary conditions (12.4), 
(12.5) and (12.6) are sufficient. 


Theorem 12.4 (Hanani, 1961, 1975; Abel and Greig, 1998; Abel et al., 2001) 
The necessary conditions (12.4) to (12.6) are sufficient, if 


k = 3 and k = 4 for all v > 4 and for all A 

k = 5 with exception of v = 15 and 4 = 2 

k = 6 for all v > 7 and A > 1 with exception of v = 21 and 4 = 2 

k=7 for all v>7 and =0, 6, 7, 12, 18, 24, 30, 35, 36 (mod (42)) and all 2 > 30 not 
divisible by 2 or 3 

k = 8 for A = 1, with 38 possible exceptions for v, namely, the values 


113, 169, 176, 225, 281, 337, 393, 624, 736, 785, 1065, 1121, 1128, 1177, 1233, 
1240, 1296, 1345, 1401, 1408, 1457, 1464, 1513, 1520, 1569, 1576, 1737, 1793, 
1905, 1961, 2185, 2241, 2577, 2913, 3305, 3417, 3473, 3753. 

From these 38 values of v exist (v, 8, 2)-BIBD with exception of v = 393, but for 
A = 2 there are further values of v: 29, 36, 365, 477, 484, 533, 540, 589 for which 
the existence is not clear. The necessary conditions are sufficient for all 2 > 5 and 
for A = 4 if v A 22. 

Because the proof of this theorem is enormous, we refer to the original liter- 
ature. For 1 = 4.and v = 22 there exists no BIBD, the smallest BIBD for v = 22 and 
k =8 given by the R-programme OPDOE is that for 4 = 8, b = 66 and r = 24. 
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Theorem 12.5 (Theorem 1.2, Abel et al. (2002a, 2002b, 2004)) 
The necessary conditions for the existence of a (v, k = 9, A)-BIBD in the following 
cases are sufficient: 


a) Ford = 2 (necessary conditions: v = 1,9 (mod 36)) with the possible exception 
of v = 189, 253, 505, 765, 837, 1197, 1837 and 1845 

b) For A =3 (necessary conditions: v = 1,9 (mod 24)) with the possible exception 
of v = 177, 345 and 385 

c) Ford =4 (necessary conditions: v = 1,9 (mod 18)) with the possible exception 
v = 315, 459 and 783 

d) For 4 = 6 (necessary conditions: v = 1,9 (mod 12)) with the possible excep- 
tion v=213 

e) For J = 8 (necessary conditions: v = 0,1 (mod 9)) 

f) For A = 9 (necessary conditions: v =1 (mod 8)) 

g) For A = 12 (necessary conditions: v = 1,3 (mod 6) with v = 9) 

h) For A = 18, 24, 36, 72 and all further values of A, not being divisor of 72 


The proof is given in Abel et al. (2002a, 2002b, 2004), where it was stated that 
the possible exceptions could not be definite shown as exceptions, for all other 
block designs the existence was shown. Cases not yet clear are given in 
Tables 12.1 and 12.2. 

Hanani (1989) showed that the necessary conditions (12.4), (12.5) and (12.6) 
are sufficient for the existence of a BIBD with k = 7 and 4 =3 and A =21 with 
the possible exception for the values 4 = 3 and v = 323, 351, 407, 519, 525, 
575, 665. 

Sun (2012) showed that if the number of treatments is a prime power, in many 
cases the necessary conditions are sufficient for the existence of a BIBD. 


Table 12.1 Values of v in not yet constructed (v, k= 9, 4)-BIBD with / = 1. 


145 153 217 225 289 297 361 369 505 793 865 873 945 1017 1081 1305 1441 1513 1585 1593 


1665 1729 1809 1881 1945 1953 2025 2233 2241 2305 2385 2449 2457 2665 2737 2745 2881 
2889 2961 3025 3097 3105 3241 3321 3385 3393 3601 3745 3753 3817 4033 4257 4321 4393 
4401 4465 4473 4825 4833 4897 4905 5401 5473 5481 6049 6129 6625 6705 6769 6777 6913 
7345 7353 7425 9505 10017 10665 12529 12537 13185 13753 13833 13969 14113 14473 
14553 14625 14689 15049 15057 16497. 


Table 12.2 (v, k= 9, A)-BIBD with 4 > 1 not yet constructed. 


(177,9,3) (189,9,2) (213,9,6) (253,9,2) (315,9,4) (345,9,3) (385,9,3) (459,9,4) (505,9,2) 
(765,9,2) (783,9,4) (837,9,2) (1197,9,2) (1837,9,2) (1845,9,2) 
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Results for the existence of symmetric BIBD contains the following theorem: 


Theorem 12.6 (Bruck-Ryser-Chowla-Theorem, Mohan et al. (2004)) 
If the parameters v; k; 1 of a BIBD fulfil the existence condition (12.5) for k =r, so 
for the existence of a symmetric BIBD, it is necessary that either 


a) vis even and k - Aa is a square number or 


b) vis odd and z = (k—A)x*+(- 1)? Ay has a non-trivial integer solution 4; y; z. 


Some authors published tables of BIBD; the first for r < 10 stems from Fisher 
and Yates (1963). For 11 < r < 15 we find a table in Rao (1961) and for 
16 <r < 20 in Sprott (1962). Takeuchi (1962) gives further tables for v < 100, 
k < 30,4 < 14. The parameter combinations of further tables are given in Ragha- 
varao (1971) for v< 100, k < 15, < 15; in Collins (1976) v< 50,k < 23,A<11;in 
Mathon and Rosa (2006) for r < 41 and in Mohan et al. (2004) for v< 111k <55 
A < 30 (Colbourn and Dinitz, 2006). 


12.2.2. Construction Methods of BIBD 


In this section we show the multiplicity of methods of the construction of BIBD, 
but these are not exhaustive. Further methods are, for instance, given in Abel 
et al. (2004) or in Rasch et al. (2011). In the latter R-programme, methods 
are described using difference sets and difference families, not described here. 


Definition 12.12 Let p be a prime. Then for an integer i put s = p. Each 
ordered set X = (X0,....%,) of 1 + 1 elements x; of a Galois field GF(s) is a point 
of a (finite) projective geometry PG(n,s). Two sets Y=(yo,....¥n) and 
X = (x9;.-.%n) with y; = qx;(i=0,...,2) and an element q of the GF(s) unequal 
0 represent the same point. The elements x;(i=0,...,2) of X are coordinates 
of X. All points of a PG(u,s), fulfilling the n-m linear independent homogeneous 
equations . Miki =0;j=1,...,.1-m; aj € GF(s), create an m-dimensional 
subspace of the PG(,s). Subspaces with xo = 0 are subspaces in the infinite. In 


n+l_qy m+1_ 


a PG(n,s) there are Q, = : i different points and Q,, = = 


each m-dimensional subspace. The number of m-dimensional subspaces of a 
PG(n,s) is 


points in 


(s**1_1)(s"-1)...(s"-™*1-1) 
(s™*+1—1)(s”-1)...(s-1) 


p(n,m,s) = ,(m20;n=m). (12.7) 
The number of different m-dimensional subspaces of a PG(n,s), having no point 
in common, is 


gmtl_y 
p(n,m,s) gitl_] 


(=(n-1,m-1,s) ifm=1). 
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The number of different -dimensional subspaces of a PG(v,s) with two differ- 
ent points in common is 


("*1-1)(""=1) 


y(n,m,s) (1 1)("=1) (=9(n-2,m-2,s),if m2 2). 


Method 12.1 We construct a PG(u,s) and consider their points as the v treat- 


ments and for each m, the m-dimensional subspace as a block. This gives a 
BIBD with 


gttl_y 
a = 
b=(n,m,s), 
gmtl_y 
r= wnat Pm); 
m+1 
-1 
k=" 
s-l 
gM@t1_]4).(gm_] 
ey 


(s" 7-1) (s"=1) 
where ¢(n, m, s) is defined in Definition 12.12. 


Example 12.6 We construct a PG(3,2) with s =p = 2;h = 1 and n = 3. The 
GF (2) is {0, 1}, a minimal function we do not need, because = 1. The 15 ele- 
ments (treatments) of the PG(3,2) are all possible combinations of (0;1)-values 
in X = (xo,....%3) with the exception of (0,0,0,0): 


{(1,0,0,0), (0,1,0,0), (0,0,1,0), (0,0,0,1), (1,1,0,0), (1,0,1,0), (1,0,0,1), (0,1,1,0), 
(0,1,0,1), (0,0,1,1), (1,1,1,0), (1,1,0,1), (1,0,1,1), (0,1,1,1), (1,1,1,1)}. 


With m =2 the equation (1 — m = 1) for the two dimensional subspaces is 
do + 4X1 + dox2 + 43x3 =0 with all combinations of coefficients of the GF(2) 
(except (0,0,0,0)). These are just the same quadruple as the 15 points above. 
We create now a (15x15) matrix with rows defined by the treatments and col- 
umns defined by the subspaces (blocks). In each cell of the matrix, we insert a 1 
if the point lies in the block and a 0 otherwise. We consider the first block 
defined by dg = 0. All points with do at the first place are in that block. These are 
the points 2, 3, 4, 8, 9, 10 and 14. The second equation is x; = 0. In that block 
are all points with dg as the second entry. These are the points 1, 3, 4, 6, 7, 
10 and 13. So we continue with all 15 blocks and receive the symmetric BIBD with 
v=b=15,r=k=7andd/ =3. 
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Block Treatments 

1 1 2 4 5 8 10 15 
2 2 3 5 6 9 11 1 
3 3 4 6 7 10 12 2 
4 4 5 7 8 11 13 3 
5 5 6 8 9 12 14 4 
6 6 7 9 10 13 15 5 
7 7 8 10 11 14 1 6 
8 8 9 11 12 15 2 7 
9 9 10 12 13 1 3 8 
10 10 11 13 14 2 4 9 
11 11 12 14 15 3 5 10 
12 12 13 15 1 4 6 11 
13 13 14 1 2 5 7 12 
14 14 15 2 3 6 8 13 
15 15 1 4 7 9 14 


Definition 12.13 Let p be a prime. Then for an integer h is s = p”. Each 
ordered set X = (x1,...,%,) of m elements x; of a GF(s) is a point of a (finite) 
Euclidean geometry EG(v,s). Two sets Y = (yo,...,¥n) and X = (xo,...,%1) with 
y, =x;(i=0,...,2) represent the same point. The elements x;(i=1,...,2) of X 
are coordinates of X. All points of a EG(u,s), fulfilling the “ - m linear 


independent equations on Bik = 0;j=1,....2-m;ay € GF(s) and x = 1, 
create an m-dimensional subspace of the EG(v,s). 


In EG(n,s) there are s” different points and s” points in each m-dimensional 
subspace. The number of m-dimensional subspaces of an EG(v,s) passing 
through one fixed point is 


y(n-1,m-1,s). 


The number of different m-dimensional subspaces of an EG(n,s) with two dif- 
ferent points in common is 


y(n-2,m-2,s). 
Method 12.2 We can construct an EG(n,s) and consider its points as v treat- 


ments and for each m, the m-dimensional subspaces as block. This gives a 
BIBD with 
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v=s", 
b=@(n,m,s)-p(n-1,m,s), 
gmthoy 
r= paad a PbS), 
k=s", 
(s™*1_1).(s™-1) 


= (9a1).("=1) (n,m, s) 


Example 12.7. We construct an EG(3,2) with s =p = 2;h =1,n =3 andm =2. 
The parameters of the block design are 


v=2'? =8, 
b = (32,2) - o(2,2,2) =15-1=14, 
Cael 

r= Pe ee 

k=s* =4, 

re (s3 -1)-(s?-1) ek 

(##=1)-(8=1) 
and the block design is 

Block Treatments 
1 1 3 5 7 
2 1 2 5 6 
3 A 4 5 8 
4 1 2 3 4 
5 1 3 6 8 
6 1 2 7 8 
7 1 4 6 7 
8 2 4 6 8 
9 3 4 7 8 
10 2 3 6 7 
11 5 6 vA 8 
12 2 4 5 7 
13 3 4 5 6 
14 2 3 5 8 
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Method 12.3 If A is the incidence matrix of a BIBD with parameters 
v=b=41+3,r=k=21+1 and dA=/(/=1,2....) 
and if AV is the incidence matrix of the complementary BIBD, then the matrix 
N*= e A is the incidence matrix of a BIBD (4/ + 4, 81 +6, 4/1 + 3,21 +2, 


of 1 
21+1). 


Vv 


Example 12.8 Let / = 1 then 


1110000 Cobar a 

POO 1100 0110011 

0101010 LOO 104 
N=}]1000011]andN=/0111100 

0011001 ie fee 0 a Mes a 

0010110 At OAT 0504 

0100101 Ota 0 he 

This results in 

LL10000 00017111 

00: tale OVO 190 Oe 

OO O20 0:1 O04 
jyre[}0000110111100 

O50 E005, F160 

0.0. 9-0:1 FO O28 O64 

L000 LOD 1LOL POLO 

11111110000000 


This is the incidence matrix of a BIBD with v = 8,b = 14,r=7,k =4 anda = 3, 
and it is isomorphic with that in Example 12.7. 


As we have seen, different methods can lead to the same block design. 
We now need the minimal functions of a GF(p") as presented in Table 12.3. 
A minimal function P(x) can be used to generate the elements of a GF(p"). We 
need the function 


F(x) = ao + ax + aap iat 
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Table 12.3 Minimal functions P(x) of a GF(p"). 


p h P(x) p h P(x) p h P(x) 
2 2 er 4xtd 5 2 x? 42x43 11 2 4047 
3 xP tex741 3B 4x2 42 3 xP tx743 
4 t+ +l 4 493 42x? 42 4 x 4403 42 
5 x 4x3 +1 5 xP +x 42 5 xP +aF +4749 
6 xo 4x5 41 6 xo 40°42 13 2 4042 
3 2 4042 a 2 e 4+043 3 4042 
3 42x41 3 4x 4042 4 x 403 434742 
4 x +e4+2 4 x 4x3 42743 17 2 P4043 
5 x4 2x441 5 wont e 3 C4043 
6 xo 49542 6 4 4xt43 4 x44 4043 


with integer coefficients a;(i=0,...,4-1) as the elements of a GF(p). The 
function 


P(x) =f (x) + pq(x) + P(x) Q(x) (12.8) 


with the minimal function P(x) and certain polynomials q(x) and Q(x) creates a 
class of functions, the residues modulo p and P(x). We write 


F(x) =f (x)(modp;P(x)). (12.9) 


If p and P(x) are fixed and f(x) is variable F(x) generates just p” classes (func- 
tions) representing a GF(p") iff p is prime and P(x) is a minimal function of 
GE(p"). 


Method 12.4 If v=p”™, where p is prime and m a natural number with the 
elements of a GF {d)=0;4; =1;...,4,-1}, we construct v — 1 LS (see 


Section 12.3) A; = (a); l=1,...,v —1 as follows: A; = (a) is the addition 


table of a group, the elements of A; = (a); t= 2,....v — 1 are ai; - ait. We 
construct the v(v — 1) matrix A = (Aj,...,A)_1). With the desired block size k, 
we choose k different elements from the GF. Each column of A defines one block 
of the BIBD; its elements are just the row numbers of A of the k selected ele- 
ments of the GF. If each block occurs w > 2 times, we delete w — 1 copies. To 
find out, whether blocks occur more than once, we order the elements in the 
blocks lexicographically. The parameters of the original BIBD are 


v=p";b=v(v-1);r=k(v—-1);k;A=k(k-1). 
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The reduced BIBD then has the parameters 


Example 12.9 We try to construct a BIBD with v = 9. For v=9 = 3? is p = 3; 
m = 2. The minimal function is x7 + x +2 and f(x) = ao + a1 with coefficients 
ajji=0,1 from GF(3) = {0, 1, 2}. The function F(x) =f (x) (mod3; x? +x + 2) 
gives the nine elements of GF(9) for all values of f(x): 


Qo ay f(x) = F(x) 

0 0 ao =0 

0 i a2, =x 

0 2 a3 =2x 

1 0 a=1 

1: 1 a4=1+%x 

1 2 as=x°= 1+ 2x 
2 0 do6= 2 

2 1 a7=2+%x 

2 2 dg= 2+ 2x 


The addition table of GF(9) is a LS: 


0 1 x 2x 1l+x 1+2x 2 2+x 2+2x 
1 2 l+x 14+2% 2+x% 242% 0 x 2x 
x l+x 2x O 1+2x% 1 2+x 2+2x 2 
2x 1+2x O x 1 l+x 24+2% 2 2+x 
l+x 2+x 14+2e¥ 1 24+2e 2 x 2x 0 
1+2x 24+2x 1 l+x 2 2+x 2x 0 x 
2 0 2+x 2+2x «x 2x 1 l+x 1+2x 
2+%x x 2+2axw 2 2x 0 l+x 1+2x% 1 


2+2x 2x 2 2+x 0 x 1+2x 1 1+x 
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The seven other matrices are (at first we multiply with a2 =~) the following: 


x 1+2% 
2x 1 

1 2+x 
2+2x 0 


l+x 2+2x 


2 x 
0 l+x 
1+2x* 2 
2+x 2x 
2x 2+xK 
x 2 
2 1+2x% 
l+x 0 


1 2x 

0 2+2x 
2+%x 1 
1+2x x 

1+x 1 


2+2x 24+x 


24+%x 2 
x 0 
2x x 


1+2x l+x 


0 2x 
1 1+2x 
2 2+2x 


2+%x 1 


2+2x 1+x 


O 2+2x 
1+2x «x 
x 2 
1 2x 
2 1+2x 
2x 2+x 
l+x 0 
1+2x 2 
l+x 2+2x 
0 l+x 
2+x 2x 
2x 1 
2 x 
1 2+x 
x 1+2x 
2+2x 0 
2 2+x 
x 2x 
0 x 
1 l+x 


l+x 1+2x 


2+x 24+2x 


1+2x 1 
2+2x 2 
2x 0) 


2+2x 
2 


x 


2x <1l+x 
O 1+2x 
l+x 2 
2 2x 
1+2x 2+ 
2+x 0 
x 1 
2+2x 
2+2x x 
x 2+2% 
0 2+x 
2+2x 1 
1 x 
2+x 1+2x 
1+2x 0 
2x 2 
2 l+x 
l+x 2% 
2+2x 2x 
0 1 
2x 14+2% 
1+2x% 2+2x 
1 2 
2 0 
l+x 2+% 
2+x x 


x 1+x 


2 


2+x 


2+2x 
2x 


x 
l+x 


2+x 
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14+2x 2+2x 1+x 


2+%x x 2 

x 1l+x 0 

2 QO 2+2x 
1 2 14+2x 
2x l+2x «x 

0 1 2x 

2+2x Ww 22+x 


2 2x x 
1 2+2e% 2+% 
2+2x «x 0 
2+x 0 2x 
1+2x 2+x 2 
l+x 2 24+2x 
O 1+2x 1+x 
2x 1l+%x 1 
x 1 1+2x 
2+x lt+x 24+2x 
1+2x 2x 1 
2x 2+2ax OO 
1 0 l+x 
2 1 2+x 
x 2+x 2x 
0 2 x 
l+x x 1+2% 


24+2x 1+2x 2 


x 2 
1 2x 
2 1+2x 
1+2x «x 
2x 2+x 
2+x 1 
2+2x 1+x 
l+x 0 
O 2+2x 
2+2x 2+x 
1+2x% l+x 
2+x 2 
2 2+2x 
l+x 1 
1 1+2x 
2x x 
x 0 
0 2x 
2x 1 
2 x 
1 2+% 
2+x 2x 
x 1+2% 
1+2x 2 


l+x 2+2x 
2+2x O 
0 1+x 


l+x 
2+2x 
2+x 
2x 
1 


2+ 
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O 2+2% 2 1 o1+2% wx Il+s* x 2+%x 
2+2x 1l+x 1+2x 2x x 2+x 0 2 1 
2 14+2x 1 ) 2x 2+2’x «x 2+x 1+x 
1 2x 0 2 2+2x% 142% 2+x% l+ex x 
1+2x* x 2x 2+2x 2+x 14+x 2 1 0 
x 1 0 2 
l+x 0 x 2+%x 2 1 24+2% 1+2% 2& 
x 2 2+x 1+x 1 O 1+2x wz 24+2x 
2 2x 2+2x 14+2x 


2x 2+x 24+2x 14+2x 1l+x 


2+%x 1 1+«x x 0 


We now choose the four elements 0; 1;2;x and get the blocks (1,2,3,7) [from the 
first row of the addition table] and 


(1,2,7,8); (1,4,6,9); (3,4,5,8); (4,6,7,9); (3,5,8,9); (1,2,5,7); (2,4,6,9); (3,5,6,8). 
From the next matrix we get 
(1,2,5,9); (1,3,6,7); (2,4,6,8); (3,5,6,7); (1,4,5,9); (2,3,4,8); (2,3,7,8); (3,6,7,9); (1,5,8,9). 
We continue in this way and get 
1,5,7,9); (2,3,6,7); (2,4,8,9); (3,6,7,8); (1,5,6,9); (2,4,5,8); (1,2,4,8); (3,4,6,7); (1,3,5,9); 
1,3,4,6); (4,7,8,9); (1,3,4,5); (1,2,3,4); (3,7,8,9); (1,7,8,9); (2,5,6,9); (2,5,6,8); (2,5,6,7); 
1,5,6,8); (3,4,5,7); (2,4,5,7); (2,3,6,9); (1,2,3,9); (1,4,6,8); (2,3,8,9); (1,6,7,8); (4,5,7,9); 
( J ) ) ( ) ) 
( )( ) ) ( ) ) 
( )5( ) ) ( ) ) 


? 


1,2,4,7); (1,2,7,9); (3,4,6,9); (1,3,5,8); (4,6,8,9); (3,5,7,8); (1,2,6,7); (4,5,6,9); (2,3,5,8); 
1,6,8,9); (4,5,6,7); (4,5,7,8); (2,3,7,9); (2,3,5,9); (1,2,6,8); (2,3,4,9); (1,3,6,8); (1,4,5,7); 
1,3,4,8); (5,7,8,9); (1,3,4,7); (1,3,4,9); (2,7,8,9); (6,7,8,9); (2,3,5,6); (1,2,5,6); (2,4,5,6). 


, 


, 


All blocks are different, which means w = 1 and v=9;b=72;r=32;k =4;4=12. 

We know from Theorem 12.3 that for k = 4, a BIBD exists with parameter 
fulfilling the necessary conditions [v = 9;b = 18;r = 8;k = 4;A = 3]. This shows that 
Method 12.4 even for w = 1 does not necessarily lead to a smallest BIBD. We 
recommend, therefore, to use this method only if no other method for the pair 
(v,k) is available. 


Method 12.5 A BIBD with parameters v=s,b=s(s+1),k =s can be parti- 
tioned into s + 1 groups with s blocks each. The blocks of the groups 2 to 
s + 1 are (s — 1) times included into the BIBD to be constructed. The blocks 
from group 1 occur once. Finally this so obtained set is complemented by all 
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(s — 1)-tuples from the blocks of group 1 complemented by the treatment v + 1. 
The so constructed BIBD has the parameters 


vas’ +1k=s;b=s(s’+1);r=s;A=s-1. 
Example 12.10 We construct a BIBD with parameters v= 10;k =3;b=30; 


r=9;4=2. The BIBD with the parameters v =9;k =3;b=12;r=4;A=1 (s = 3) 
is written in four groups: 


(1,2,6) (1,3,7) (1,4,8) 
Group 1: ¢ (3,4,5) }; Group 2: ¢ (2,4,9) >; Group 3: ¢ (2,5,7) >. 
(7,8,9) (5,6,8) (3,6,9) 
(1,5,9) 
Group 4 ¢ (2,3,8) 
(4,6,7) 


The blocks of groups 2—4 are used twice for the BIBD to be constructed, and 
group 1 is used once giving all together 21 blocks. The nine pairs (1,2), (1,6), 
(2,6), (3,4), (3,5), (4,5), (7,8, (7,9) and (8,9) from the blocks of group 1 are com- 
plemented by the treatment 10 giving nine more blocks and finally the design 
with v=10;k =3;b=30;r=9;A=2. 

In this BIBD some (but not all) blocks occur repeatedly. 


Definition 12.14 A square matrix H,, of order 1 with elements —1 and +1 isa 
Hadamard matrix, if H,,H,,7 =n1,,. 


A necessary condition for the existence of a Hadamard matrix for 1 > 2 is 
n=0(mod4). This necessary condition is sufficient for all 7<201. 


1 1 
Trivially H, = (1); H2 = ( .) 


Each Hadamard matrix w.Lo.g. can be written in a normal form in which the 
first row and the first column contains only the elements +1 and the Kronecker 
product H,,,® Hy, =Hy,n, of two Hadamard matrices H,,;H,2 is a Hadamard 
matrix of order 11-2. 


Method 12.6 Let Hbe a Hadamard matrix of order 1 = 4 t in normal form and 
B be the matrix gained from H by deleting the first row and the first column. In B 
we replace the elements —1 by 0 and receive the incidence matrix of a BIBD with 
v=b=4t-l;r=k=2t-1A=t-l. 


Example 12.11 A BIBD with v = b =15;r=k = 7, A = 3 is obtained from a 
Hadamard matrix of order 16 (¢ = 4) in normal form 


Block Designs 


1 1 1£1#21=21 ~=21 «21 «O21 1 1 1 161 
1 1 1 1 1 1 1 1 1 1 1 1 1 1 
1-1 -1 1 2 =-1-1 1 1 “1 1 41 -1-1 
-1 -1 1 72 -1 -1 1 #21 =-1 -1 «21~«21 ~;-1--1021 
1 1 1 -1 -1 -1 -1 1~«O21 1-1-1 =1 =1 
1 1 1-1 1 11 1 1 1 1-1 1 1 1 
1 -1 -1 -1 -1 1 #1 #1 1 -l1 -1 -1 1 1 
=1 <1 1 +1 1 sT 1 1 1-1 1 11 1 -1 
1 1 1 é1é21é6i1 1-1 Lethe? sh =1 

1 1 1 1 1 1 1-1 1 1 1 1 1 1 
Loeb slo1 Fo elehekel bk 1 =ler 1 1 
“Lesh 1. 1 =) sh 1. =I -l1 -1 1 1 -1 
1 1 1 =1 =1 =1 =-1 =1 11 1 1 1 
1 Tad: «d 1 1 1 1 -1 1 -!1 
Losdelb =1-s1 2 2£ shoesk 1.4. 1 2 el =1 
=lhsl Los) Lh 1 =1 =1 =1 1 =1 =1 1 


a 
— 


We delete the first row and the first column and replace —1 by 0: 


or Or Or Or Or Or CO rF O&O 
OO eH On: OrRe OO OR le On OS 
Se Oo oF fF CO Or fF CO CF fF CO SO 
oo OO Oe el el 2 ee ee 
F OF COC OrF Or rR Or CO OF CO 
ee Sr Oo (OS OS Oe ee SO: (OO Oe 
oO FPF Or COC Or Or fF OF CO OO 
Ce <> > a > > > > a 


Ee OF Or Or oororor o 
ee COC COrRrRFoooorr OO & 
OFF Oo Orr Or OOF FF Cc Oo 
eee re COCOoCooooe oF KF eS 
oF OF FP OF Or OF COC CoO rF oO 
el ee ee 
Fe CO Or Or FF OO CO Fr rF OF CO CO 
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This is the incidence matrix of the design: 


(2,4,6,8,10,12,14); (1,4,5,8,9,12,13); (3,4,7,8,11,12,15); (1,2,3,8,9,10,11); (2,5,7,8,10,13,15); 
(1,6,7,8,9,14,15); (3,5,6,8,11,13,14); (1,2,3,4,5,6,7); (2,4,6,9,11,13,15) : (1,4,5,10,11,14,15); 
(3,4,7,9,10,13,14); (1,2,3,12,13,14,15); (2,5,7,9,11,12,14); (1,6,7,10,11,12,13); (3,5,6,9,10,12,15). 


We find that all pairs occur three times and each element seven times. 


Method 12.7. Let v=p"=m(A-1)+1, pa prime, m= 1 and x a primitive ele- 
ment of GF(v). The blocks (0,x/,0!*,«/*7™,...,0/* 4-2") 7 =0,...,m-1 are 
so-called initial blocks. From these initial blocks we construct a BIBD with 1; 
b = mv; k =A; r = mk, 4 by adding modulo p after increasing all elements by 1. 


Example 12.12 We construct a BIBD for v = p = 29=7-4 +41, with m =7, 
A = 5. The initial blocks are (0,«/,«/*7,«'*14,4/+21) 7=0,...,6, a primitive ele- 
ment of GF(29) is x = 2. We receive a BIBD with b = 203 blocks, k=4=5 
and r = 35. The initial block for i = O is, for instance, 
(0,1,2” = 128 = 12,2'* = 28, 271 = 17). Adding 1 to these treatments, we obtain 
the next of the 29 blocks of this initial block, namely, (1,2,13,0,18). Adding 1 
to all treatments, results in the first two blocks, that is, (1,2,13,18,29) and 
(1,2,3,14,19). Thus all 203 blocks can be generated. 


Method 12.8 In asymmetric BIBD with parameters v = b, k = 1, A, we delete 
one block and then delete from all other blocks all the elements occurring in the 
deleted block. By this we obtain a BIBD with parameters 


Vv =v-k,b* =v-1,k =k-A,r* =k,v =A. 
If particularly v = b =4t-1,;r=k=2t-1,A =t- 1, we get a BIBD with 
v" = 2t,b* =4t-2,k° =t,r* =2t-1,4° =t-1. 
This BIBD is said to be a residual design to the initial BIBD. 
Example 12.13 We start with the symmetric BIBD of Example 12.6 with 


v=b=15,r=k=7 anda =3 and delete its first block and then in the other 
blocks all treatments occurring in the deleted block (bold print in the scheme): 


Block Treatments 

1 1 2 4 5 8 10 15 
2 2 3 5 6 11 1 
3 3 4 6 Z 10 12 2 
4 4 5 7 8 11 13 3 
5 5 6 8 9 12 14 4 
6 6 7 9 10 13 15 5 
7 7 8 10 11 14 1 6 
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Block Treatments 

8 8 9 11 12 15 2 7 
9 9 10 12 13 1 3 8 
10 10 11 13 14 2 4 9 
11 11 12 14 15 3 5 10 
12 12 13 15 1 4 6 11 
13 13 14 2 5 7 12 
14 14 15 3 6 8 13 
15 15 1 4 7 9 14. 


We rename the remaining eight treatments with 3 in 1, 6 in 2,7 in 3,9 in 4, ll in 
5, 12 in 6, 13 in 7, 14 in 8 and obtain the BIBD: 


Block Treatments 

1 1 2 4 5 
2 1 2 3 6 
3 3 5 7 1 
4 2 4 6 8 
5 2 3 4 7 
6 3 5 8 2 
7 4 5 6 3 
8 4 6 7 1 
9 5 7 8 4 
10 5 6 8 1 
11 6 7 2 5 
12 7 8 3 6 
13 8 1 2 7 
14 1 3 4 8 


Method 12.9 From a symmetric BIBD with parameters v = b, k = r, A, we 
delete one block and in the remaining blocks we drop the treatments, not con- 
tained in this block. We obtain a BIBD with parameters v* = k,b* = v—1,k* =A, 
re=k-1,2* =A-1. 


Example 12.14 We choose the design of Example 12.13 with v = b = 15, 
r=k=7 and / = 3 and delete its first block and then in the other blocks all 
treatments not occurring in the deleted block and rename as in Example 12.13. 
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We obtain the blocks (1,2,4), (2,3,6), (3,4,5), (3,4,5), (4,6,7), (1,5,6), (2,5,7), 
(1,5,6), (2,3,6), (4,6,7), (1,3,7), (1,2,4), (2,5,7) and (1,3,7). 

In this BIBD each block occurs twice. We reduce it to a BIBD with 7 blocks 
dropping one of these blocks and obtain the BIBD: 

(1,2,4), (1,3,7), (1,5,6), (2,3,6), (2,5,7), (3,4,5), (4,6,7) with vy =7,b=7,k=r=3 
and A = 1. 


12.2.3 Partially Balanced Incomplete Block Designs 


Partially balanced incomplete block designs are of less practical interest than 
completely balanced ones. They do not allow estimating all treatment differ- 
ences with equal precision. 


Definition 12.15 We consider v treatments 1,2,...,v, an association scheme 
with m classes fulfils the conditions: 


1) Two given treatments are either first, second,...or mth associates. 

2) Each treatment w in {1,2,...,v }has n; ith associates (i = 1,..., m); the number n; 
does not depend on w. 

3) If the treatments w and z are ith associates, then the number of treatments 
that are jth associates of w and /th associates of zis Pi independent of w and z. 


We write this in form of the matrices: 


1 yl 2 hd 
Pu P Pu P 
P,= . . and P, = x : 
Pa P22 Pry P22 
The numbers v, m; and Pi are the parameters of the association scheme. 


Definition 12.16 An incomplete proper and equireplicate block design with v 
treatments in b blocks with k < v elements each is a partially balanced incom- 
plete block design PBIBD; if in the case that the treatments w and z are ith 
associates, they occur together in exactly 1; blocks independent of the pair w 
and z. 

For a PBIBD beside (12.4), we have 


m 


Sonj=v-1 (12.10) 


i=1 


and in place of (12.5) 


Sway =7(k-1). (12.11) 


Experimental Designs 


A BIBD is a special case of a PBIBD with m = 1. Then (12.10) and (12.11) become 
(12.5). Of special interest are PBIBD(2) with two association classes. A part of 
the treatment pairs then occurs together in exactly A, and all the rest in exactly 
Ay blocks. We give the following: 


Example 12.15 We show a PBIBD with m = 2 and the parameters 
v=8,k=3, b=16,r=6,A; =2, Ad =1, n,=5, Ny = 2. 


Block Treatments 

1 1 2 4 
2 2 3 5 
3 3 4 6 
4 4 5 7 
5 5 6 8 
6 6 7 1 
7 7 8 2 
8 8 1 3 
9 1 2 5 
10 2 3 6 
11 3 4 7 
13 5 6 1 
14 6 7 2 
15 7 8 3 
16 8 1 4 


The pair (1,2) occurs twice, but the pair (1,7) only once; the five first associates 
of 1 are 2,4,5,6,8 and the two second associates are 3 and 7, the pairs with 1, so as 
the pair (1,2), where the partner is a first associates of 1, occur twice, the pair 
(1,7) however only once. The pairs with 1 such as pair (1,7) with a partner that 
is second associates of 1 occurs once. The PBIBD(2) with v = 8, k = 3 has only 
16 blocks, the BIBD has 56. 

In Rasch et al. (2008) PBIBD(2) are given and close now the topic of construction 
methods (with one exception) and define only some special cases with an example. 


Example 12.16 Let 


110 

Ni =N2= ( 0 , be incidence matrices of two (identical) BIBD with 
0141 

parameters v= 3, b=3,k=2,r=2and/=1. Then the incidence matrix of the 

Kronecker product design is 
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N=N,@No=|101000101 


011000011 
0001103110 


000101101 
000011011 
This matrix is symmetric and the product WN equals 
422211211 
24231213121 


224112112 
11422211 


iS) 


NN 


ll 
a 


21242121 


a 


12224112 
11211422 


N 


121121242 
11211222 4 


This matrix corresponds to a PBIBD(2) with parameters v=9, b=9,k =4, 
r=A4, A, =1, Ap = 2, and the necessary conditions (12.10) and (12.11) are fulfilled. 


We consider now some subgroups of PBIBD(2). 


Definition 12.17 A PBIBD(2) is said to be divisible, if v= qw and the treat- 
ments can be arranged into q groups of w elements each so that pairs of treat- 
ments in the same group occur in A, blocks, and pairs of treatments not from the 
same group occur in Az blocks. 


Example 12.17 A block design with v=6, b=4,k=3,r=2 and the blocks 
(1,3,5), (1,4,6) and (2,3,6) is a divisible PBIBD(2) with qg =3, A; =0, 42 =1 and 
the three groups [1, 2]; [3, 4]; [5, 6]. 


Experimental Designs 


Definition 12.18 A PBIBD(2) is said to be simple, if one of the 4;(i= 1,2) 
equals zero. 


As we can see the classes of PBIBD(2) in Definitions 12.17 and 12.18 can con- 
tain the same design; the design of Example 12.17 is simple. 

In the PBIBD(2) with the blocks (1,2,3), (4,5,6), (7,8,9), (1,4,7), (2,5,8), (3,6,9), 
(1,5,9), (2,6,7) and (3,4,8), each of the v =9 treatments occur in three blocks of 
(r =3); the b=9 blocks are of size k = 3. Pairs of treatments occur either once 
(A; = 1) or not at all (A, = 0) together in a block. Therefore the design is a simple 
PBIBD(2). 


-1 
Definition 12.19 A PBIBD(2) is said to be a triangular design, if v= ee ) 


and the treatments can be arranged in an upper triangular matrix of a square 
(u x “) matrix in such a way that after the triangular matrix is transformed into 
a ‘symmetric matrix’ without a main diagonal by reflection and if two treatments 
in the same row or column occur /; - times and two treatments not in the same 
row or column occur A, - times in the same block. 


Triangular design exists for v 26 only. 


Example 12.18 The blocks (1,2,7,8,10), (1,3,5,9,10), (1,4,6,8,9), (2,3,6,7,9), 
(2,4,5,6,10) and (3,4,5,7,8) are from a triangular design with parameters 
v=10,b=6,k=5,r=3,4, =1,d. =2and u = 5. Arranging the treatments as 


= 1 2 3 4 
i Sur 255 “G6. 37 
2 5 #8 9 
3 6 8 i 10 
4 7 £10 


pairs of treatments in the same row or column occur in one block, all others in 
two blocks. 


Definition 12.20 A PBIBD(2) is said to be cyclic, if v>5, the PBIBD(2) is not 
divisible and v = 4t + 1; m, =n» = 2t. 


t-lé 
For cyclic designs the association matrices are P, -{ ) 


t ¢ 
t ¢ 
and P:=( ). 
t t-1 


Example 12.19 We choose ¢ = 3, so that v = 13. The associations matrices are 


2 3 3 3 
P,= (; and P» = : ») . Further 1 = 2 = 6. The condition (12.11) reads 


nA, + NyAr = 6(A1 +A) = r(k-1). 
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We get solutions (A; = A, is impossible, it gives a BIBD) as the following: 
(Ay +A2) =1.r=k=3; 
(Ay +A2) =5,r=k=6 
(Ay +A2) =7,r=k=7. 


Each solution defines a cyclic PBIBD(2). Next we give the design for 
Ay =1, A. =Oandr=k=3. The 13 blocks are the following: 


(1,3,9), (1,6,8), (1,7,12), (2,4,10), (2,7,9), (2,8,13), (3,5,11), (3,8,10), 
(4,6,12), (4,9,11), (5,7,13), (5,10,12), (611,13) 


The 39 pairs 


1,3;1,6;1,7;1,8;1,9;1,12;2,4, 2,7; 2,8;2,9;2,10;2,13;3,5;3,8;3,9; 
3,10;3,11; 4,6; 4,9; 4, 10; 4, 11; 4, 12;5,7;5,10;5,11;5, 12; 5,13; 6, 8; 
6,11; 6,12; 6,13; 7,9;7,12;7,13;8,10;8,13; 9,11; 10;12; 11,13 


are first associates and occur once in the design; the other 39 do not occur. 


12.3 Row-Column Designs 


We consider now some RCD. These experimental designs are used to eliminate 
two nuisance factors in two directions written as rows and columns. 

The name RCD stems from the fact that the design can be characterized by a 
matrix so that its r rows correspond to the levels of one and its c columns cor- 
respond to the levels of the other nuisance factor. The elements represent the 
treatments. Construction and analysis depend on the special type of an RCD. 
The most important RCD are shown below: 


Row-column designs (RCD) 


Resolvable RCD Non-resolvable RCD Period designs 
(crossover designs) 


Lattice squares Latinised RCD 


Latin squares Latin rectangles Youden designs 


Definition 12.21 Resolvable RCD are experimental designs, with an arrange- 
ment of v treatments in t matrices with r rows and c columns in such a way that 
v=rc and all v treatments occur in each matrix. The matrices are not 
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understood as levels of a third nuisance factor; they are ¢ replications with chan- 
ged order of the treatments in the matrices. 

An important group of resolvable RCD are lattice squares with r = c and con- 
sequently v a square number, if both the row blocks and the column blocks build 
a BIBD, they are called balanced. 

Another group is the Latinised RCD. An experimental design constructed 
from ft replications of a resolvable RCD with tr rows and c > t columns is said 
to be columnwise Latinised, if not treatment occurs more often than once in a 
column. Analogous rowwise Latinised RCD are defined. 


Example 12.20 A balanced lattice square with r =c =3, v=9, t =3 is given by 
the replications 1—4 in the schema. 


1 2 3 4 

1 4 7 1 1 

4 2 9 4 6 
8 9 6 9 5 4 3 


Non-resolvable RCD are the LS and Latin rectangles (LR) and the Youden 
designs (YD). First we will consider LS. 


Definition 12.22 If in an experiment with v treatments a square matrix of 
order v is given so that each of the v treatments Aj, ...,A, occurs exactly once 
in each row and in each column, we say it is an LS of side v. If the treatments are 
in natural order, then an LS where the 4; in the first row and in the first column 
are arranged in this natural order is called an LS in standard form. If A; are 
arranged in this natural order only in the first row, we have an LS in semi- 
standard form. In the LS treatments often are represented by letters A, B, C..... 


Example 12.21 An LS of side seven is 


Gm oC te a er 
Me Qgnawtyt® 
S,Qmaubanms 
>Samaypinamab 
QAwtwe mm BOA 
S fp mm tf * & © 
moose Qqan 


Each complete Sudoku scheme is a LS of side nine with additional conditions. 
Randomisation of LS means that we first must determine the set M of pos- 
sible standardised LS of a given side. From this set we randomly select one 
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element. Some of these LS are isomorphic, which means that one results 
from the other by permutation of rows, columns and treatments. Otherwise, 
they belong to different classes. Some elements of different classes are con- 
jugate, which means that one results from the other by interchanging rows 
and columns. For instance, for v = 6 exist M = 9408 different standardised 
LS in 22 classes. In ten of these classes per class are two standardised LS 
conjugated. But for the randomisation we simply can randomly select one 
of the 9408 standardised LS. Then we randomly assign the levels of the 
two nuisance factors to the rows and columns, respectively, and the treat- 
ments to the numbers 1 to x. 

Special LS are used in many applications. The next definition is given in Free- 
man (1979). 


Definition 12.23 Complete LS are LS with all ordered pairs of experimental 
units occurring next to each other once in each row and column. Quasi- 
complete LS are LS with all unordered pairs of experimental units occurring 
twice in each row and column. 

Bailey (1984) gave methods for the construction of quasi-complete LS and 
discussed randomisation problems. She could show that randomisation in a 
subset is valid while in the whole set is not. 


Definition 12.24 Two LS the side v with A= (aj) and B=b; (ij =1,..5¥; 
aj € {1,...,v},by € {1,...,v}) are orthogonal, if each combination (f,g) 
(f,.g € {1,....v}) occurs exactly once among the v’ pairs (ay, by). A set of m > 2 
LS of the same side is called a set of mutually orthogonal LS (MOLS), if all pairs 
of this set are orthogonal. 


There exist maximal v — 1 MOLS. It is not fully clear how many MOLS exist. 
Wilson (1974) showed that the maximal number of MOLS is 26 as long as 
v>90 and for large v it is >vi7-2. 

Up to v = 13 we have the following: 


Vv 3 4 5 6 7 8 9 10 11 12 13 
Number of MOLS 2 3 4 1 6 7 8 22 10 25 12 


The case v = 6 was investigated by Leonard Euler (1782). Tsarina Catherine the 
Great set Euler the task to arrange six regiments consisting of six officers each of 
different ranks in a 6 x 6 square so that no rank or regiment will be repeated in 
any row or column. That means one has to construct two orthogonal LS of side 
six. Euler conjectured that this is impossible. This conjecture was proved by 
Tarry (1900, 1901). However, Euler’s more general conjecture that no orthog- 
onal LS of side v = 4¢ +2 exist was disproved. Bose and Shrikhande (1960) 
showed that two orthogonal LS of side 10 exist. 
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Definition 12.25 An RCD with v treatments and two disturbance variables 
with r and c levels, respectively, is said to be a LR, if 2<r<v;2<c<v, and the 
design can be written as (r x c) matrix with v different elements (1, ..., v) in such 
a way, which in each row or column, each of the v elements occurs at most once. 


Special cases are the LS and the YD. 


Definition 12.26 A YD is an RCD that is generated from a LS by dropping at 
least one column so that the rows form a BIBD. Therefore a YD has exactly v 
rows and c < v columns. 

If one column is dropped, a YD certainly results. If more columns are 
dropped, balance must be checked. 


Definition 12.27 A groups period design (GPD) is an experimental design, in 
which the experimental units are investigated in successive periods, the groups 
correspond to the rows and the periods to the columns of an RCD. 


GPD was first used as feeding experiments with animals. The groups of animals 
were fed differently in the periods of observation. Generally isa GPD a RCD with 
the experimental units as rows and the periods of observation as columns. More 
about this can be found in Johnson (2010) and Raghavarao and Padgett (2014). 


12.4 Factorial Designs 


Factorial designs are only shortly defined here to complete this chapter. Orig- 
inally, the idea of such designs was developed in Fisher (1935). A general 
description can be found in Mukerjee and Wu (2006). Factorial designs play 
a fundamental role in efficient experimentation with multiple input variables 
and is used in various fields of application, including engineering, agriculture 
and life sciences. The factors are not applied and observed one after the other 
but at the same time. This can spare time and costs. Fractional factorial designs 
are described in Gunst and Mason (2009). 


Definition 12.28 An experiment with p > 2 (treatment) factors F;(i=1,...,p) 
arranged so that these p factors occur at the same time with different levels in 
this experiment is said to be a factorial experiment or a factorial design with p 
factors. If s; > 2 are the number of the levels of the ith factor (i= 1,...,p), the fac- 
torial experiment is called an (sj, 59, ...,,) factorial design. Experiments with 
$1 =S) =... =S, =S are symmetric, all other experiments are asymmetric. Sym- 
metric experiments with s levels of p factors are n-experiments. If not all factor 
level combinations occur in a factorial design but some conditions are fulfilled, 
we speak about fractional factorial designs. 


If in a factorial design for N experimental objects it is counted how many of these 
objects belong to the factor level combinations, the result is a contingency table. 
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If the factors are qualitative and the observed character is quantitative, we can 
analyse the design by ANOVA. 

If the factors are quantitative and the observed character is quantitative, we 
can analyse the design by the regression analysis. 

More details can be found in Rasch et al. (2011) and in Rasch et al. (2008). 


12.5 Programs for Construction of Experimental Designs 


By the R-programme OPDOE in CRAN completely and partially balanced block 
designs and fractional factorial designs can be constructed. The following is an 
example of the construction of a BIBD with v = b =15,k =r=7,A = 3: 

The command is 


> make, BIBD (s=2,n=3,m=2,method=3) 


As the result we obtain 


Balanced Incomplete Block Design: BIBD(15,15,7,7,3) 
ey 2 3) By 255.56; 7). “(1G 24 334.84 9 10;- FL) 

1:2). 32/12, 13 1470S) (154557 1895: 12 13) 

4, 5,10,11,14,15) (1, 6, 7, 8, 9,14,15) 
67. P10 21 12-13) 24 4s. 659 8..:1-05-127.14) 

4, 6, 9,11,13,15) (2, 5, 7, 8,10,13,15) 
pbs TAO TL 12 LA) 634 0 pe Be AL 412 515) 
, 4, 7, 9,10,13,14) (3, 5, 6, 8,11,13,14) 
, By 9-10 - 12 5) 


1 


The method 3 of the program is the method 1 in this chapter. 


12.6 Exercises 


12.1 Randomise the trivial BIBD: 


(1,2,3) (1,3,6) (1,6,7) (2,4,7) (3,5,6) 
(1,2,4) (1,3,7) (2,3,4) (2,5,6) (3,5,7) 
(1,2,5) (1,4,5) (2,3,5) (2,5,7) (3,6,7) 
(1,2,6) (1,4,6) (2,3,6) (2,6,7) (4,5,6). 
(1,2,7) (1,47) (2,3,7) (3,45) (45,7) 
(1,3,4) (15,6) (2,4,5) (3,4,6) (4,6,7) 
(1,3,5) (1,5,7) (2,46) (3,4,7) (5,6,7) 
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12.2 Construct the dual BIBD to the BIBD with the parameters b = 7, r= 3, 
A= 1 and the incidence matrix 
1110000 
1001100 
0101001 
1000011 
0011010 
0010101 
0100110 
Write the generated BIBD in bracket form. 
12.3 Give the parameters of a BIBD constructed by a PG(3,4). 
12.4 Give the parameters of a BIBD constructed by a EG(3,4). 
12.5 Construct a BIBD by Method 12.3 with 4 = 2. 
12.6 Give the parameters of a BIBD constructed by Method 12.4 with m = 4. 
12.7 Show the equivalence of a) and b) in Definition 12.6. 
12.8 Transform the LS of Example 12.21 by interchanging the rows in a semi- 
standardised LS. 
12.9 Inthe LS of Example 12.21, drop the two last columns and check whether 
the result is a YD. 
References 


Abel, R. J. R. and Greig, M. (1998) Balanced incomplete block designs with block size 
7. J. Des. Codes Crypt., 13, 5-30. 

Abel, R. J. R., Bluskov, I. and Greig, M. (2001) Balanced incomplete block designs 
with block size 8. J. Comb. Des., 9, 233-268. 

Abel, R. J. R., Bluskov, I. and Greig, M. (2002a) Balanced incomplete block designs 
with block size 9 and A = 2,4,8. Des. Codes Crypt., 26, 33-59. 

Abel, R. J. R., Bluskov, I. and Greig, M. (2002b) Balanced incomplete block designs 
with block size 9, II. Aust. J. Combin., 30, 57—73. 

Abel, R. J. R., Bluskov, I. and Greig, M. (2004) Balanced incomplete block designs 
with block size 9, If. Discret. Math., 279, 5-32. 


605 


606 


Mathematical Statistics 


Bailey, R. A. (1984) Quasi-complete Latin squares: construction and randomisation. 
J. R. Stat. Soc. B., 46, 323-334. 

Bose, R. C. and Shrikhande, S. S. (1960) On the construction of sets of mutually 
orthogonal Latin squares and the falsity of a conjecture of Euler. Trans. Am. 
Math. Soc., 95, 191-209. 

Colbourn, C. J. and Dinitz, J. H. (2006) The CRC Handbook of Combinatorial 
Designs. Chapman and Hall, Boca Raton. 

Collins, J. R. (1976) Constructing BIBDs with a computer. Ars Comb., 1, 187-231. 

Euler, L. (1782) Recherches sur une nouvelle espece de quarres magiques, 
Verhandelingen Zeeuwsch Genootschap der Wetenschappen te Vlissingen 9, 
Middelburg, 85-239. Reprinted Communicationes arithmeticae 2, 1849, 
202-361. 

Fisher, R. A. (1935) The Design of Experiments, Oliver & Boyd, Edinburgh. 

Fisher, R. A. and Yates, F. (1963) Statistical Tables for Biological, Agricultural and 
Medical Research, 6th edition, Oliver & Boyd, Edinburgh and London. 

Freeman, G. H. (1979) Complete Latin squares and related experimental designs. 
J. R. Stat. Soc. B., 41, 253-262. 

Gunst, R. F. and Mason, R. L. (2009) Fractional Factorial Design. John Wiley & Sons, 
Inc., New York. 

Hanani, H. (1961) The existence and construction of balanced incomplete block 
designs. Ann. Math. Stat., 32, 361-386. 

Hanani, H. (1975) Balanced incomplete block designs and related designs. Discrete 
Mathem., 11, 275-289. 

Hanani, H. (1989) BIBD’s with block-size seven. Discret. Math., 77, 89-96. 

Johnson, D. E. (2010) Crossover experiments. Comput. Stat., 2, 620-625. 

Mathon, R. and Rosa, A. (2006) 2-(v; k;A) designs of small order, in Colbourn, C. J. 
and Dinitz, J. H. (Eds.), Handbook of Combinatorial Designs, 2nd edition, 
Chapman & Hall/CRC, Boca Raton, pp. 25-58. 

Mohan, R. N., Kageyama, S. and Nair, M. N. (2004) On a characterization of 
symmetric balanced incomplete block designs. Discussiones Mathematicae, 
Probability and Statistics, 24, 41-58. 

Mukerjee, R. and Wu, C. F. J (2006) A Modern Theory of Factorial Design, Springer, 
New York. 

Raghavarao, D. (1971) Constructions and Combinatorial Problems in Design of 
Experiments, John Wiley & Sons, Inc., New York. 

Raghavarao, D. and Padgett, L. (2014) Repeated Measurements and Cross-Over 
Designs, John Wiley & Sons, Inc., New York. 

Rao, C. R. (1961) A study of BIB designs with replications 11 to 15. Sankhya Ser. A, 
23, 117-129. 

Rasch, D., Herrendérfer, G., Bock, J., Victor, N. and Guiard, V. (Eds.) (2008) 
Verfahrensbibliothek Versuchsplanung und - auswertung. 2. verbesserte Auflage 
in einem Band mit CD, R. Oldenbourg Verlag, Miinchen. 


Experimental Designs 


Rasch, D., Pilz, J.. Verdooren, R.L. and Gebhardt, A. (2011) Optimal Experimental 
Design with R, Chapman and Hall, Boca Raton. 

Rasch, D., Teuscher, F. and Verdooren, R. L. (2016) A conjecture about BIBDs. 
Commun. Stat. Simul. Comput., 45, 1526-1537. 

Sprott, D. A. (1962) A list of BIB designs with r = 16 to 20. Sankhya Ser. A., 24, 
203-204. 

Sun, H. M. (2012) On the existence of simple BIBDs with number of elements a 
prime power. J. Comb. Des., 21, 47-59. 

Takeuchi, K. (1962) A table of difference sets generating balanced incomplete block 
designs. Rev. Int. Stat. Inst., 30, 361-366. 

Tarry, G. (1900) Le Probléme de 36 Officiers. Compte Rendu de l’Association 
Francaise pour l’Avancement de Science Naturel, Secrétariat de l’Association, 
1, 122-123. 

Tarry, G. (1901) Le Probléme de 36 Officiers. Compte Rendu de l’Association 
Francaise pour l’Avancement de Science Naturel, Secrétariat de l’Association, 
2, 170-203. 

Tocher, K. D. (1952) The design and analysis of block experiments. J. R. Stat. Soc. B., 
14, 45-91. 

Wilson, R. M. (1974) Concerning the number of mutually orthogonal Latin squares. 
Discret. Math., 9, 181-198. 


607 


609 


Appendix A: Symbolism 


Partially we distinguish in notation from other mathematical disciplines. We do 
not use capital letters as in probability theory to denote random variables 
but denote them by bold printing. We do this not only to distinguish between 
a random variable F with F-distribution and its realisation F but mainly because 
linear models are important in this book. In a mixed model in the two-way 
cross-classification of the analysis of variance with a fixed factor A anda random 
factor B, the model equation with capital letters is written as 


Yije =H + a; + Bj + (GB), + Exe. 
This looks strange and is unusual. We use instead 
Vij =H + a; + Bj + (ab); + ex. 


Functions are never written without an argument to avoid confusion. So is p(y) 
often a probability function but p a probability. Further is f(y) a density function 
but fthe symbol for degrees of freedom. 


Sense Symbol 

Rounding-up function [x] = smallest integer 2x 
Binomial distribution with parameters n, p B(np) 

Chi-squared (y”) distribution with f degrees of freedom CS (f) 

Determinant of the matrix A |A|, det(A) 

Diagonal matrix of order n diag(dy, ..., dy) 

Direct product of the sets A and B A®B 

Direct sum of the sets A and B A@®B 

Identity matrix of order f 

(n x m) matrix with only zeros Onm 


(Continued) 
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Sense 


(1 x m) matrix with only ones 


Euclidean space of dimension n and 1, respectively 


(real axis). Positive real axis 
y is distributed as 


Indicator function 


Interval on the x-axis 
Open 

Half open 

Closed 


ith-order statistic 


Cardinality (number) of elements in S 


Constant in formulae 


Kronecker product of matrices NV and Nz 


Empty set 


Multivariate normal distribution with expectation vector 


Hand covariance matrix Y 


Normal distribution with expectation and variance o? 


Null vector with 1 elements 
Vector with 1 ones 


Parameter space 


Poisson distribution with parameter 2 


P-quantile of the N(0, 1) distribution 


P-quantile of the y? distribution with fdegrees of freedom 
P-quantile of the ¢-distribution with f degrees of freedom 
P-quantile of the F-distribution with f, and f, degrees of 


freedom 
Rank of matrix A 


Rank space of matrix A 


Standard normal distribution with 


Expectation 0; variance 1 
Trace of matrix A 


Transposed vector of Y 


Vector (column vector) 


Distribution function of a N(0,1) distribution 
Density function of a N(0,1) distribution 


Random variable (bold print) 


Symbol 


Lim 


R"; R) = R; R* 


BD Aad 
If A isa set and xcEA, 


lifxeA 
then I, (x) = 0, if aA 


(a,b): a<x«<b 
[a,b):asx<b, (a,b]:a<x<b 
[a,b]:a<x<b 

£40) 

card(S); |S| 

const. 

N=N1@N2 

© 

N(w2) 


N(u,0”) 
0, 

1, 

Q 

P(A) 


z(P) or Zp (see Table D.3 last 
line) 


CS(f|P) (see Table D.4) 
t(f|P) (see Table D.3) 


F(fi,fa | P) = Fe(fi,fr) (see 
Table D.5) 


rk(A) 
RIA] 
N(0,1) 


tr(A) 
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ASN 
BAN 
BIBD 
BLUE 
BLUP 
BQUE 
df 

iff 

LS 
LSE 
LSM 
LVUE 


MINQUE 
ML 

MLE 

MS 

MSD 
PBIBD 


RCD 


average sample number 

best asymptotic normal (estimator) 
balanced incomplete block design 
best linear unbiased estimator 
best linear unbiased prediction 
best quadratic unbiased estimator 
degrees of freedom 

if and only if 

Latin square 

least squares estimator 

least squares method 


locally variance-optimal unbiased 
estimator 


minimum quadratics norm estimator 
maximum likelihood 

maximum likelihood estimator 

mean squares 

mean square deviation 


partially balanced incomplete block 
design 


row-—column design 
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REML 
SLRT 
SS 
UMP 
UMPU 
UVUE 


W.Lo.g. 


YD 


restricted maximum likelihood 
sequential likelihood ratio test 

sum of squares 

uniformly most powerful (test) 
uniformly most powerful unbiased (test) 


uniformly variance-optimal unbiased 
estimator 


without loss of generality 


Youden design 
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Appendix C: Probability and Density Functions 


Bernoulli distribution plyp) =p" (1-p)'”, 0<p<1,y=0,1 


Beta distribution 1 “1 b-1 
0) = ——-y*"*(1- 
f(y-9) Blaby (1-y) 


0<y¥<10<ab<o 


Binomial distribution PP) = Oe -p)";0<p<lyy=0,....1 
k 

Exponential family (9,0) = h(y)eai- 1(9)-Ti(y) -B(A) 
Exponential family in . Ty) —-A 
canonical form f (90) = (yea 18 TH) A) 
Exponential distribution f (yA) =Ae-;A ER* yy 20 
Geometrical distribution p(y,p) =p(-p)*; y=1,2,..50<p<1 
Uniform distribution in (a,b) _f(y,a,b) = —_ a<b, a<sy<b 

-a 

ONG) 

aN ny 

Hypergeometric distribution ? (.M,N,n) = N SN | 
n 
ve {0,...,N};M <N integer 

Negative binomial POD) = & : pra -p)y" 
distribution ‘ 


O<p<ly2r,re {0,1,...} 
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Normal distribution 


Pareto distribution 


Poisson distribution 


Weibull distribution 


on)? 


1 
) Rom = e- 20 5 
Fo") a 


-— 00 <p,y< co,0>0; see Table D.1 
6a® 
t(y,9) = pme ocureee =Rt* 


yy 
p(y, 4)=—e“*,A>0 y=0,1,2,... 
y! 


f(y,0) =0a(Oy)* 'e- ", a>0, y>0 
deQ=R* 
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Table D.1 Density function ¢(z) of the standard normal distribution 


0.00 


39894 
39695 
39104 
38139 
36827 
35207 
33322 
31225 
28969 
26609 
24197 
21785 
19419 
17137 
14973 
12952 
11092 
09405 
07895 
06562 


0.01 


39892 
39654 
39024 
38023 
36678 
35029 
33121 
31006 
28737 
26369 
23955 
21546 
19186 
16915 
14764 
12758 
10915 
09246 
07754 
06438 


0.02 


39886 
39608 
38940 
37903 
36526 
34849 
32918 
30785 
.28504 
.26129 
.23713 
.21307 
18954 
16694 
14556 
12566 
10741 
09089 
07614 
06316 


0.03 


39876 
39559 
38853 
37780 
36371 
34667 
32713 
30563 
28269 
25888 
23471 
21069 
18724 
16474 
14350 
12376 
10567 
08933 
.07477 
06195 


0.04 


39862 
39505 
38762 
37654 
36213 
34482 
32506 
30339 
28034 
25647 
23230 
20831 
18494 
16256 
14146 
12188 
10396 
.08780 
07341 
.06077 


0.05 


39844 
39448 
38667 
37524 
36053 
34294 
32297 
30114 
27798 
25406 
22988 
20594 
18265 
16038 
13943 
12001 
10226 
08628 
.07206 
05959 


0.06 


39822 
39387 
38568 
37391 
35889 
34105 
32086 
29887 
27562 
25164 
22747 
20357 
18037 
15822 
13742 
11816 
-10059 
08478 
07074 
05844 


0.07 


39797 
39322 
38466 
37255 
35723 
33912 
31874 
29659 
27324 
24923 
22506 
20121 
17810 
15608 
13542 
11632 
09893 
08329 
06943 
.05730 


0.08 


39767 
39253 
38361 
37115 
35553 
33718 
31659 
29431 
27086 
24681 
22265 
19886 
17585 
15395 
13344 
11450 
.09728 
08183 
06814 
.05618 


0.09 


39733 
39181 
38251 
36973 
35381 
33521 
31443 
.29200 
26848 
24439 
.22025 
19652 
17360 
15183 
13147 
11270 
.09566 
08038 
06687 
05508 


2.0 
2.1 
2.2 
2.3 
2.4 
2.5 
2.6 
2.7 
2.8 
2.9. 
3.0 
4.0 


05399 
04398 
03547 
02833 
02239 
01753 
01358 
01042 
00792 
00595 
00443 
00013 


05292 
04307 
03470 
02768 
02186 
01709 
01323 
01014 
.00770 
.00578 
00327 
.00009 


05186 
04217 
03394 
02705 
02134 
.01667 
01289 
.00987 
00748 
00562 
00238 
.00006 


05082 
04128 
03319 
02643 
02083 
01625 
01256 
00961 
00727 
00545 
00172 
00004 


04980 
04041 
03246 
02582 
02033 
01585 
01223 
.00935 
.00707 
.00530 
00123 
00002 


04879 
.03955 
03174 
02522 
01984 
01545 
01191 
.00909 
.00687 
.00514 
.00087 
00002 


04780 
03871 
03103 
02463 
01936 
01506 
01160 
00885 
.00668 
00499 
.00061 
00001 


04682 
03788 
03034 
02406 
01888 
01468 
01130 
00861 
00649 
00485 
00042 
00001 


04586 
.03706 
02965 
02349 
01842 
01431 
01100 
.00837 
00631 
00470 
00029 


04491 
03626 
02898 
02294 
01797 
01394 
01071 
00814 
00613 
00457 
00020 


Table D.2 Distribution function ®(z),z>0 of the standard normal distribution (the values of ®(z),z<0 are 1—@(z),z>0). 


z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 

0.0 -500000 503989 507978 -511967 515953 519939 523922 527903 531881 535856 
0.1 539828 543795 547758 551717 555670 559618 563559 567495 571424 575345 
0.2 579260 583166 587064 590954 594835 -598706 602568 606420 610261 -614092 
0.3 617911 621719 625516 629300 633072 636831 640576 644309 648027 .651732 
0.4 655422 659097 662757 666402 .670031 673645 677242 680822 684386 687933 
0.5 691462 694974 698468 701944 705401 708840 -712260 715661 719043 722405 
0.6 725747 .729069 732371 735653 738914 742154 745373 748571 751748 754903 
0.7 -758036 761148 764238 767305 770350 .773373 776373 779350 782305 785236 
0.8 -788145 791030 793892 -796731 799546 802337 805106 807850 810570 813267 
0.9 815940 818589 821214 823814 826391 828944 831472 833977 836457 838913 
1.0 841345 843752 846136 848495 850830 853141 855428 857690 859929 862143 
11 864334 866500 868643 870762 872857 874928 876976 878999 -881000 882977 
1.2 884930 886860 888767 -890651 892512 894350 896165 897958 899727 901475 
13 903199 904902 906582 908241 -909877 911492 913085 914656 -916207 917736 
14 919243 920730 922196 923641 925066 926471 927855 929219 930563 931888 
15 933193 934478 935744 936992 938220 939429 940620 941792 942947 944083 
1.6 945201 946301 947384 948449 949497 950529 951543 952540 953521 954486 
1.7 955435 956367 957284 958185 959071 959941 -960796 961636 962462 963273 
1.8 964070 964852 965621 .966375 967116 967843 968557 969258 969946 970621 


1.9 971284 971933 972571 973197 973810 974412 975002 975581 976148 976705 


2.0 
21 
2.2 
2.3 
24 
2.5 
2.6 
2.7 
2.8 
2.9 
3.0 


977250 
982136 
-986097 
989276 
991802 
993790 
995339 
996533 
997445 
998134 
998650 


977784 
982571 
986447 
989556 
992024 
993963, 
995473 
-996636 
997523 
998193, 
999032 


-978308 
982997 
986791 
-989830 
-992240 
994132 
995603 
996736 
997599 
998250 
999313 


978822 
983414 
987126 
-990097 
992451 
994297 
995731 
996833 
997673 
998305 
999517 


979325 
983823 
987455 
-990358 
992656 
994457 
995855 
996928 
997744 
998359 
999663 


979818 
984222 
.987776 
990613 
992857 
994614 
995975 
997020 
997814 
998411 
.999767 


-980301 
984614 
988089 
-990863 
993053 
994766 
996093 
997110 
997882 
998462 
999841 


980774 
984997 
988396 
991106 
993244 
994915 
996207 
997197 
997948 
998511 
999892, 


981237 
985371 
988696 
991344 
993431 
995060 
996319 
997282 
998012 
998559 
999928 


-981691 
985738 
988989 
-991576 
993613 
995201 
996427 
997365 
998074 
-998605 
999952 


Table D.3 P-quantiles of the t-distribution with df degrees of freedom (for df= co P-quantiles of the standard normal distribution). 


P 
df 0.60 0.70 0.80 0.85 0.90 0.95 0.975 0.99 0.995 

1 0.3249 0.7265 1.3764 1.9626 3.0777 6.3138 12.7062 31.8205 63.6567 

2 0.2887 0.6172 1.0607 1.3862 1.8856 2.9200 4.3027 6.9646 9.9248 

3 0.2767 0.5844 0.9785 1.2498 1.6377 2.3534 3.1824 4.5407 5.8409 

4 0.2707 0.5686 0.9410 1.1896 1.5332 2.1318 2.7764 3.7469 4.6041 

5 0.2672 0.5594 0.9195 1.1558 1.4759 2.0150 2.5706 3.3649 4.0321 

6 0.2648 0.5534 0.9057 1.1342 1.4398 1.9432 2.4469 3.1427 3.7074 

7 0.2632 0.5491 0.8960 1.1192 1.4149 1.8946 2.3646 2.9980 3.4995 

8 0.2619 0.5459 0.8889 1.1081 1.3968 1.8595 2.3060 2.8965 3.3554 

o. 0.2610 0.5435 0.8834 1.0997 1.3830 1.8331 2.2622 2.8214 3.2498 
10 0.2602 0.5415 0.8791 1.0931 1.3722 1.8125 2.2281 2.7638 3.1693 
11 0.2596 0.5399 0.8755 1.0877 1.3634 1.7959 2.2010 2.7181 3.1058 
12 0.2590 0.5386 0.8726 1.0832 1.3562 1.7823 2.1788 2.6810 3.0545 
13 0.2586 0.5375 0.8702 1.0795 1.3502 1.7709 2.1604 2.6503 3.0123 
14 0.2582 0.5366 0.8681 1.0763 1.3450 1.7613 2.1448 2.6245 2.9768 
15 0.2579 0.5357 0.8662 1.0735 1.3406 1.7531 2.1314 2.6025 2.9467 
16 0.2576 0.5350 0.8647 1.0711 1.3368 1.7459 2.1199 2.5835 2.9208 
17 0.2573 0.5344 0.8633 1.0690 1.3334 1.7396 2.1098 2.5669 2.8982 
18 0.2571 0.5338 0.8620 1.0672 1.3304 1.7341 2.1009 2.5524 2.8784 


19 0.2569 0.5333 0.8610 1.0655 1.3277 1.7291 2.0930 2.5395 2.8609 


0.2567 
0.2566 
0.2564 
0.2563 
0.2562 
0.2561 
0.2560 
0.2559 
0.2558 
0.2557 
0.2556 
0.2550 
0.2547 
0.2545 
0.2543 
0.2542 
0.2541 
0.2540 
0.2536 
0.2535 
0.2533 


0.5329 
0.5325 
0.5321 
0.5317 
0.5314 
0.5312 
0.5309 
0.5306 
0.5304 
0.5302 
0.5300 
0.5286 
0.5278 
0.5272 
0.5268 
0.5265 
0.5263 
0.5261 
0.5250 
0.5247 
0.5244 


0.8600 
0.8591 
0.8583 
0.8575 
0.8569 
0.8562 
0.8557 
0.8551 
0.8546 
0.8542 
0.8538 
0.8507 
0.8489 
0.8477 
0.8468 
0.8461 
0.8456 
0.8452 
0.8428 
0.8423 
0.8416 


1.0640 
1.0627 
1.0614 
1.0603 
1.0593 
1.0584 
1.0575 
1.0567 
1.0560 
1.0553 
1.0547 
1.0500 
1.0473 
1.0455 
1.0442 
1.0432 
1.0424 
1.0418 
1.0382 
1.0375 
1.0364 


1.3253 
1.3232 
1.3212 
1.3195 
1.3178 
1.3163 
1.3150 
1.3137 
1.3125 
1.3114 
1.3104 
1.3031 
1.2987 
1.2958 
1.2938 
1.2922 
1.2910 
1.2901 
1.2844 
1.2832 
1.2816 


1.7247 
1.7207 
17171 
1.7139 
1.7109 
1.7081 
1.7056 
1.7033 
1.7011 
1.6991 
1.6973 
1.6839 
1.6759 
1.6706 
1.6669 
1.6641 
1.6620 
1.6602 
1.6499 
1.6479 
1.6449 


2.0860 
2.0796 
2.0739 
2.0687 
2.0639 
2.0595 
2.0555 
2.0518 
2.0484 
2.0452 
2.0423 
2.0211 
2.0086 
2.0003 
1.9944 
1.9901 
1.9867 
1.9840 
1.9679 
1.9647 
1.9600 


2.5280 
2.5176 
2.5083 
2.4999 
2.4922 
2.4851 
2.4786 
2.4727 
2.4671 
2.4620 
2.4573 
2.4233 
2.4033 
2.3901 
2.3808 
2.3739 
2.3685 
2.3642 
2.3451 
2.3338 
2.3263 


2.8453 
2.8314 
2.8188 
2.8073 
2.7969 
2.7874 
2.7787 
2.7707 
2.7633 
2.7564 
2.7500 
2.7045 
2.6778 
2.6603 
2.6479 
2.6387 
2.6316 
2.6259 
2.5923 
2.5857 
2.5758 


Table D.4 P-quantiles CS (df, P) of the r distribution. 


P 
df 0.005 0.010 0.025 0.050 0.100 0.250 0.500 0.750 0.900 0.950 0.975 0.990 0.995 
1 3927-10* 1571-10°’ 9821-10’ 3932-10°° 0.01579 0.1015 0.4549 1.323 2.706 3.841 5.024 6.635 7.879 
2 0.01003 0.02010 0.05064 0.1026 0.2107 0.5754 1.386 2.773 4.605 5.991 7.378 9.210 1.60 
3 0.07172 0.1148 0.2158 0.3518 0.5844 1.213 2.366 4,108 6.251 7.815 9.348 11.34 12.84 
4 — 0.2070 0.2971 0.4844, 0.7107 1.064 1.923 3.357 5.385 7.779 9.488 11.14 13.28 14.86 
5 0.4117 0.5543 0.8312 1.145 1.610 2.675 4.351 6.626 9.236 11.07 12.83 15.09 16.75 
6 0.6757 0.8721 1.237 1.635 2.204 3.455 5.348 7.841 10.64 12.59 14.45 16.81 18.55 
7 0.9893 1.239 1.690 2.167 2.833 4.255 6.346 9.037 12.02 14.07 16.01 18.48 2.28 
8 1.344 1.646 2.180 2.733 3.490 5.071 7.344 10.22 13.36 15.51 17.53 2.09 21.96 
9 1.735 2.088 2.700 3.325 4.168 5.899 8.343 11.39 14.68 16.92 19.02 21.67 23.59 
10 2.156 2.558 3.247 3.940 4.865 6.737 9.342 12.55 15.99 18.21 2.48 23.21 25.19 
11 2.603 3.053 3.816 4.575 5.578 7.584 10.34 13.70 17.28 19.68 21.92 24.72 26.76 
12 3.074 3.571 4.404 5.226 6.304 8.438 11.34 14.85 18.55 21.03 23.34 26.22 28.30 
13 3.565 4.107 5.009 5.892 7.042 9.299 12.34 15.98 19.81 22.36 24.74 27.69 29.82 
14 4.075 4.660 5.629 6.571 7.790 10.17 13.34 17.12 21.06 23.68 26.12 29.14 31.32 
15 4.601 5.229 6.262 7.261 8.547 11.04 14.34 18.25 22.31 25.00 27.49 3.58 32.80 
16 5.142 5.812 6.908 7.962 9.312 11.91 15.34 19.37 23.54 26.30 28.85 32.00 34.27 
17 5.697 6.408 7.564 8.672 10.09 12.79 16.34 2.49 24.77 27.59 3.19 33.41 35.72 
18 6.265 7.015 8.231 9.390 10.86 13.68 17.34 21.60 25.99 28.87 31.53 34.81 37.16 


19 6.844 7.633 8.907 10.12 11.65 14.56 18.34 22.72 27.20 3.14 32.85 36.19 38.58 


20 
21 
22 
23 
24 
25 
26 
27 
28 
29 
30 
40 
50 
60 
70 
80 
90 
100 


7.434 
8.034 
8.643 
9.260 
9.886 
10.52 
11.16 
11.81 
12.46 
13.12 
13.79 
20.71 
27.99 
35.53 
43.28 
51.17 
59.20 
67.33 


8.260 

8.897 

9.542 
10.20 
10.86 
11.52 
12.20 
12.88 
13.56 
14.26 
14.95 
22.16 
29.71 
37.48 
45.44 
53.54 
61.75 
70.06 


9.591 
10.28 
10.98 
11.69 
12.40 
13.12 
1384 
14.57 
15.31 
16.05 
16.79 
24.43 
32.36 
40.48 
48.76 
57.15 
65.65 
74.22 


10.85 
11.59 
12.34 
13.09 
13.85 
14.61 
15.38 
16.15 
16.93 
17.71 
18.49 
26.51 
34.76 
43.19 
51.74 
60.39 
69.13 
77.93 


12.44 
13.24 
14.04 
14.85 
15.66 
16.47 
17.29 
18.11 
18.94 
19.77 
20.60 
29.05 
37.69 
46.46 
55.33 
64.28 
73.29 
82.36 


15.45 
16.34 
17.24 
18.14 
19.04 
19.94 

2.84 
21.75 
22.06 
23.57 
24.48 
33.66 
42.94 
52.29 
61.70 
71.14 
80.62 
90.13 


19.34 
20.34 
21.34 
22.34 
23.34 
24.34 
25.34 
26.34 
27.34 
28.34 
29.34 
39.34 
49.33 
59.33 
69.33 
79.33 
89.33 
99.33 


23.83 
24.93 
26.04 
27.14 
28.24 
29.34 
30.43 
31.53 
32.62 
33.71 
34.80 
45.62 
56.33 
66.98 
77.58 
88.13 
98.65 
109.14 


28.41 
29.62 
8.81 
32.01 
33.20 
34.38 
35.56 
36.74 
37.92 
39.09 
40.26 
51.80 
63.17 
74.40 
85.53 
96.58 
107.56 
118.50 


31.41 
32.67 
33.92 
35.17 
36.42 
37.65 
38.89 
4.11 
41.34 
42.56 
43.77 
55.76 
67.50 
79.08 
90.53 
101.88 
113.14 
124.34 


34.17 
35.48 
36.78 
38.08 
39.36 
40.65 
41.92 
43.19 
44.46 
45.72 
46.98 
59.34 
71.42 
83.30 
95.02 
106.63 
118.14 
129.56 


37.57 
38.93, 
40.22 
41.64 
42.98 
44.31 
45.64 
46.96 
48.28 
49.59 
50.89 
63.69 
76.15 
88.38 
10.42 
112.33 
124.12 
135.81 


4.00 
41.40 
42.80 
44.18 
45.56 
46.93 
48.29 
49.64 
50.99 
52.34 
53.67 
66.77 
79.49 
91.95 

104.22 
116.32 
128.30 
140.17 
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Table D.5 95 % quantiles of the F-distribution with f; and f, degrees of freedom. 


161.4 199.5 215.7 224.6 230.2 234.0 236.8 238.9 240.5 
18.51 19.00 19.16 19.25 19.30 19.33 19.35 19.37 19.38 
10.13 9.55 9.28 9.12 9.01 8.94. 8.89 8.85 8.81 

7.71 6.94. 6.59 6.39 6.26 6.16 6.09 6.04. 6.00 


5.99 5.14 4.76 4.53 4.39 4.28 4.21 4.15 4.10 
5.59 4.74, 4.35 4.12 3.97 3.87 3.79 3.73 3.68 
5.32 4.46 4.07 3.84 3.69 3.58 3.50 3.44 3.39 


1 
2 
3 
4 
5 6.61 5.79 5.41 5.19 5.05 4.95 4.88 4.82 4.77 
6 
i 
8 
9 5.12 4.26 3.86 3.63 3.48 3.37 3.29 3.23 3.18 


10 4.96 4.10 3.71 3.48 3.33 3.22 3.14 3.07 3.02 
11 4.84 3.98 3.59 3.36 3.20 3.09 3.01 2.95 2.90 
12 4.75 3.89 3.49 3.27 3.11 3.00 2.91 2.85 2.80 
13 4.67 3.81 3.41 3.18 3.03 2.92 2.83 LAT 2.71 
14 4.60 3.74 3.34 3.11 2.96 2.85 2.76 2.70 2.65 
15 4.54 3.68 3.29 3.06 2.90 2.79 Qh 2.64 2.59 
16 4.49 3.63 3.24 3.01 2.85 2.74 2.66 2.59 2.54 
17 4.45 3.59 3.20 2.96 2.81 2.70 2.61 2.55 2.49 
18 4.41 3.55 3.16 2.93 2.77 2.66 2.58 2.51 2.46 
19 4.38 3.52 3.13 2.90 2.74 2.63 2.54 2.48 2.42 
20 4.35 3.49 3.10 2.87 2.71 2.60 2.51 2.45 2.39 
21 4.32 3.47 3.07 2.84 2.68 2.57 2.49 2.42 2.37 
22 4.30 3.44 3.05 2.82 2.66 2.55 2.46 2.40 2.34 
23 4.28 3.42 3.03 2.80 2.64 2.53 2.44 2.37 2.32 
24 4.26 3.40 3.01 2.78 2.62 2.51 2.42 2.36 2.30 
25 4.24 3.39 2.99 2.76 2.60 2.49 2.40 2.34 2.28 
26 4.23 3.37 2.98 2.74 2.59 2.47 2.39 2.32 2.27 
27 4.21 3.35 2.96 2.73 2.57 2.46 2.37 2.31 2.25 
28 4.20 3.34 2.95 2.71 2.56 2.45 2.36 2.29 2.24 
29 4.18 3.33 2.93 2.70 2.55 2.43 2.35 2.28 2.22 
30 4.17 3.32 2.92 2.69 2.53 2.42 2.33 2.27 2.21 
40 4.08 3.23 2.84 2.61 2.45 2.34 2.25 2.18 2.12 
60 4.00 3.15 2.76 2.53 2.37 2.25 2.17 2.10 2.04 
120 3.92 3.07 2.68 2.45 2.29 2.17 2.09 2.02 1.96 


co 3.84 3.00 2.60 2.37 2.21 2.10 2.01 1.94 1.88 
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Table D.5 (Continued) 


1 241.9 243.9 245.9 248.0 249.1 250.1 251.1 252.2 253.3 254.3 

2 19.40 19.41 19.43 19.45 19.45 19.46 19.47 19.48 19.49 19.50 
3 8.79 8.74 8.70 8.66 8.64 8.62 8.59 8.57 8.55 8.53 
4 5.96 5.91 5.86 5.80 5.77 5.75 5.72 5.69 5.66 5.63 
5 4.74. 4.68 4.62 4.56 4.53 4.50 4.46 4.43 4.40 4.36 
6 4.06 4.00 3.94 3.87 3.84 3.81 3.77 3.74 3.70 3.67 
7 3.64 3.57 3.51 3.44 3.41 3.38 3.34 3.30 3.27 3.23 
8 3.35 3.28 3.22 3.15 3.12 3.08 3.04 3.01 2.97 2.93 
9 3.14 3.07 3.01 2.94. 2.90 2.86 2.83 2.79 2.75 2.71 


10 2.98 2.91 2.85 2.77 2.74 2.70 2.66 2.62 2.58 2.54 
11 2.85 2.79 2.72 2.65 2.61 2.57 2.53 2.49 2.45 2.40 
12 2.75 2.69 2.62 2.54 2.51 2.47 2.43 2.38 2.34 2.30 
13 2.67 2.60 2.53 2.46 2.42 2.38 2.34 2.30 2.25 2.21; 
14 2.60 2.53 2.46 2.39 2.35 2.31 2.27 2.22 2.18 2.13 
15 2.54 2.48 2.40 2.33 2.29 2.25 2.20 2.16 2.11 2.07 
16 2.49 2.42 2.35 2.28 2.24 2.19 2.15 2.11 2.06 2.01 
17 2.45 2.38 2.31 2.23 2.19 2.15 2.10 2.06 2.01 1.96 
18 2.41 2.34 2.27 2.19 2.15 2.11 2.06 2.02 1.97 1.92 
19 2.38 2.31 2.23 2.16 2.11 2.07 2.03 1.98 1.93 1.88 
20 2.35 2.28 2.20 2.12 2.08 2.04. 1.99 1.95 1.90 1.84 
21 2.32 2.25 2.18 2.10 2.05 2.01 1.96 1.92 1.87 1.81 
22 2.30 2.23 2.15 2.07 2.03 1.98 1.94 1.89 1.84 1.78 
23 2.27 2.20 2.13 2.05 2.01 1.96 1.91 1.86 1.81 1.76 
24 2.25 2.18 2.11 2.03 1.98 1.94 1.89 1.84 1.79 1.73 
25 2.24 2.16 2.09 2.01 1.96 1.92 1.87 1.82 1.77 1.71 
26 2.22, 2.15 2.07 1.99 1.95 1.90 1.85 1.80 1.75 1.69 
27 2.20 2.13 2.06 1.97 1.93 1.88 1.84 1.79 1.73 1.67 
28 2.19 2.12 2.04 1.96 1.91 1.87 1.82 1.77 1.71 1.65 
29 2.18 2.10 2.03 1.94 1.90 1.85 1.81 1.75 1.70 1.64 
30 2.16 2.09 2.01 1.93 1.89 1.84 1.79 1.74 1.68 1.62 
40 2.08 2.00 1.92 1.84 1.79 1.74 1.69 1.64 1.58 1.51 
60 1.99 1.92 1.84 1.75 1.70 1.65 1.59 1.53 1.47 1.39 
120 1.91 1.83 1.75 1.66 1.61 1.55 1.50 1.43 1.35 1.25 


oo 1.83 1.75 1.67 1.57 1.52 1.46 1.39 1.32 1.22 1.00 


Solutions and Hints for Exercises 


Chapter 1 


Exercise 1.1 


The sample is not random because inhabitants without entry in the telephone 
book cannot be selected. 


Exercise 1.2 


It is recommended to use SPSS avoiding long-winded calculations by hand. 
Write the 81 different quadruples of the numbers 1, 2, 3 due to the random sam- 
pling with replacement into the columns yj, y2, y3, ya of a SPSS data sheet 
(Statistics Data Editor). In the command sequence ‘Transform — Compute Var- 
iable’, denote the effect variable by ‘Mean’ and form (y1 + yo + y3 + ya)/4 using 
the command MEAN = MEAN(y1,y2,y3,y4). See also the SPSS syntax below. 
Now the mean values occur in column 5 of the data sheet. Analogously create 
the variable s2 = VARIANCE(y1,y2,y3,y4) and in column 6 of the data sheet 
s2 is given. 

After performing of the command sequence ‘Analyze — Descriptive Statistics — 
Descriptive’, the mean value and the variance of the population are calculated 
from the means (set under options) of MEAN and s2. The value of the 
VARIANCE of the variable MEAN must be multiplied by (N-1)/N to get the 
population variance of the sample mean (2/3)/4, because from the population 
of N = 81 samples of size 4 SPSS calculates a sample variance with denominator 
N-1. The corresponding graphical representations are obtained via ‘Graphs — 
Legacy Dialogs — Bar’. 


Mathematical Statistics, First Edition. Dieter Rasch and Dieter Schott. 
© 2018 John Wiley & Sons Ltd. Published 2018 by John Wiley & Sons Ltd. 
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SPSS output 


Descriptive statistics 


N Minimum Maximum Mean Std. deviation Variance 


Mean 81 1.00 3.00 2.0000 41079 .16875 
Valid N (listwise) 81 


Descriptive statistics 


N Minimum Maximum Mean 
s2 81 00 1.33 .6667 
Valid N (listwise) 81 


Population mean = 3 and population variance s2 = .6667 are obtained. 


Remark 

The population variance is o” = 2/3. The population variance of a sample 
mean of size 4 with replacement is 07/4. From the population of N = 81 sample 
means of possible different samples of size 4, the package SPSS calculates S”/4 
with S* = 67/(N-1). 


Exercise 1.3 
The conditional distributions are as follows: 
t! 
a) P(Y=Y|M(Y) =t) = Tent 
a TE! 
I =miny; = max); 
De ean seen came sia 
n(n-1) (ym) -¥a)) 


L O0< =miny; < maxy; = 
c) f(Y|ay = Pear) 
MV (n) 
d) M(Y) is gamma distributed. 
Exercise 1.4 
The sufficient statistics are: 
a) M= In] [}_ 19; 
b) M=S37_.9%, 
c) M= In] [}_ 197 
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Exercise 1.5 
The minimal sufficient statistics are: 
a) M= 719i: 
b) M= (H41j--¥a)) 
c) i) M= 77.19); 
ii) M= (Yay-¥)> 


d) i) M =IT7_1Y;, 
ii) M=IT7_,(1-9;). 


Exercise 1.6 


a) Apply the uniqueness theorem for power series (concerning the fact that the 
coefficients of power series are uniquely determined). 


A 
b) it| h(y)dy = 0 holds for each interval (01,02) C R!, then the integrable func- 
@ 


1 
tion h(y) has to be almost everywhere identical to 0. 
Exercise 1.7 


The statistic y,) has the density function f(t) = at" 1,0 <t<0. The family of 
these distributions (9 € R*) is complete. This can be proven as in Exercise 
1.6 (b). 


Exercise 1.8 


Let h(y) be an arbitrary discrete function with E(h(y)) =0 for all  € (0,1). For 
0 =0 it follows 4(0) = 0, and putting y =k further, 
eo h -1 ie) 
So h(k)o*"* = - ( Vana kok-!, 9€ (0,1). 
k=l (1-9) k=l 


Because of the uniqueness theorem for power series (compare Exercise 1.6), we 
get h(k) = -kh(-1), k=1,2,... 

If h(y) is bounded, then h(-1) = 0 and therefore i(y) = 0. On the other hand, 
for h(—1) = —1, the function 


y for y=-1,0,1,... 
Hor=4 


0 else 


is an unbiased estimator of zero. 
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Exercise 1.9 
We obtain for the Fisher information the expressions 
n"(0)B'(8) 
a) I(0) =B" (0) -————_., 
(0) =B"(0)-" 
>) i) 1(p)=—" 
(p) =p)’ 


= 


1 
iti) 1(8) = 
+2 


iv) I(o) 


Exercise 1.10 


0 
a) This follows by considering E at! 2) =0, i=1,2,...,p. 


2 


0 
b) This can be shown analogously as J(@) = -E Fe In L(y, a following the 


derivation after Definition 1.10. 


Exercise 1.11 


Let be M = M(Y). 


1 
a) E(M)=e~°, var(M)=e-°(1-e~®), I(0)= a var(M) > @ e~*? 
b) ny is P(n0) — distributed (Poisson). 
-20 
E(M)=e®, var(M)=e**(e%”"-1), 1(6)=", var(M) > a 
é n 
(i) 
1 1 n do 

c) E(M) = ry var(M) = ee I(@) = Re var(M) > nl(8) > Pe? (n>1) 

with g(0) =E(M) = ; 
Exercise 1.12 
a) 

i 1 2 3 4 5 6 7 8 9 


R(diy), 01) 0 7 3.5 3 10 6.5 1.5 8.5 5 
R(diy), 42) 12 7.6 9.6 54 1 3 8.4 4 6 


where R(d(y), 8) =L(d(0), 8) po(0) + L(d(A), 8) po(1). 
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b) 
i 1 2 3 4 5 6 7 8 9 
max {R(dj,0;) } 12 7.6 9.6 5.4 10 6.5 8.4 8.5 6 
pe ke 


min; max; {R(d;,0;) } = R(d4,02) = 5.4, minimax decision function dy = da(y). 


c) (dim) = E|R(di(y),)] = R(di(y),01) (01) + R(Ai (y), 02) (02). 


i 1 2 3 4 5 6 7 8 9 
r (di, 1) 9.6 7.48 8.38 4.92 2.8 3.7 7.02 4.9 5.8 


min{r(d;,a)} =2.8, Bayesian decision function dg = ds(y). 


Exercise 1.13 
cP((O-r) \/n) + b&((O-s)/n) for 0<0, 
a) R(d,,.(¥),0) = 4 b[1-D(./ns) + ®(.\/nr)| for 6=0, 
b®(./n (r-0)) +cP((s-O),/n) for 6>0. 
@(0+1)+ (6-1) for 0<0, 
b) i) R(d-1,1(y),8) = 4 2(-1) for 0=0, 
2-@(8+1)-(9-1) for O>0. 


@(0+1)+ &(6-2) for 0<0, 
ii) R(d_12(y),0) = 4 ®(-2) + B(-1) for 0=0, 
2-@(0+1)-®(9-2) for @>0. 


d_1(¥) is for 9>0 ‘better’ than d_,2(¥) (in the meaning of a smaller risk). 


Chapter 2 


Exercise 2.1 


a) ype 4P(v=k)=1, 
a for y=-1, 


b) U(y) = aeR', 
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0 for y=-1,0 
) i) So(y)=4 1 for y=1,3, E(So) =p, 
2 for y=2, 
-a for y=-1, 
0 for y=0, 
S 1 
Si(y)=41+2a for y=1, with a=, 
2+2a for y=2, 
1 for y=3, 
1 f ; 
+e — for y=-l, 
il) Soly)= 4 2 E(So(y)) =p(1-P), 
0 for y=0,1,2,3, 
~-a for =-1, 
Cc = 1 
So(y) = 0 for y=0,3, with a= 


2a for y=l, 
-2a for y=2, 


d) S,(y) is only a LVUE, and S5(y) is even a UVUE with E [Ss (y) U (y)| =0. 


Exercise 2.2 
First observe that M(Y) = y y; is completely sufficient. 


a) W(Y)=a5 : 


M(Y)=— 
(Y) N 
rem 2.4. 


Y with E(w(Y)) =p isa UVUE according to Theo- 


1 
b) S(Y)= aN M(Y) is completely sufficient; therefore _—also 


iN 
w(Y) = mapa -S(Y)) with E(w(Y)) =p(1-p). Because of Theorem 
2.4, the function w(Y) is a UVUE. 


Exercise 2.3 


1 n 

a 5 og WES y)° is distributed as CS(n — 1) and has therefore the expecta- 
oO ice 
tion 1 — 1. This implies the assertion. 


b) It is S (Y) = (1 -y)y. Hence, the assertion follows. 
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Exercise 2.4 

a) Sui (Y) =n) Su(¥) = 2y, 

b) Si (¥)= 59» Sul) = 59, 

c) Svi(¥) each value of the interval [y(,)-1,y)| (MLE not uniquely 
determined!), Sy(Y) =y- 


Exercise 2.5 
Aces (1+ =) Sui(¥) = (1 7) PETE SEIT ap 
b) Smi(¥ ) is complete sufficient and unbiased, that is, a UVUE. 
So(¥) = Sux(¥). 
Eo(Sux(¥)) =1, Eo(Su(¥)) = = 


n+2 


Exercise 2.6 
a) @=%, b=y, G=zZ, 
b) @* =%(1-22) +22(z-5), b =9(1-23) + B(Z-2), 
é* =Z(1-22) + A2(#+9) 
2 2 2 
CPi Vane eameel Renew SRE 2_ 9% 42_ % 42_ % 
with o° =o, +0, +07, anda, 52? Ab 52? Me = 


fe) 
er 


E(a) =a, E(6) =b, E(é)=c, var(a) = fs, var (6) = “6, ee ae 
E(a") =a, E(6) =b, E(é) = 
av 


c 
var(@") = —07 (1-A ar(b') = “oi(1-4), var(é") = o7(1-2). 


n 
Exercise 2.7 


The problems 


n 


w= saint sD (vi piso)? | — max 


and 


n 


S- (vi -fi(«in8))? > min 


i=1 


are equivalent. 
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Exercise 2.8 


a) Ou =J, 


b) @u=9-Byx, Bu = 


Exercise 2.9 


Exercise 2.10 


Using the Taylor expansion for ¥/¥ at (4, u), we obtain 
1 1 
Bgie 42 SO 2140 ; 
nL a Ww) yu n 


EU) = 7+ o(se-m) — +o(7) 


Exercise 2.11 


34 
a) By (2.33) we get E(yj) =" + (-1)a jHl,.. Ht. 


b) The assertion follows using the result of (a). 


Exercise 2.12 


1 a 
a) I(a)= op’ e(S(Y)) = avar(S(¥)) because of (2.44). 
b) nauz(Y)=n/y is gamma distributed with the parameters n, a such that 
cipiek MM  s . Career ; ae 2 
a(Y)= we with var(a@(Y)) = 728 fulfilled. Finally, we find e(@Y) =1- 


Exercise 2.13 


It is 


1 
E(aux(Y)) -£(5) = — aa forn— ow. 
n- 
The consistency can be proven using the Chebyshev inequality 


P{|Y-E(Y)|2e}< var(). 


Solutions and Hints for Exercises 


Exercise 2.14 


% | eee 
= via 
Our = 1+ no 1. 


Taking the (weak) law of large numbers into account, we arrive at 


It is 


: Soy SE(y?) = 2040 forn > o. 
i=1 


Chapter 3 


Exercise 3.1 


a) E(ki(y)|Ho) = £(ko(y)|Ho) =o a (Ha) = 2%, (Ha) = 1, 
b) The test 


oe ‘\ for L(y|Ha) =cL(y|Ho) 
(41 for L(y\|H4) > cL(y|Ho) 


? 


is randomised for c = 0. The test kp(y) cannot be represented in the form (3.5). 


Exercise 3.2 
We put 
Incg-nln a a 
—Po 
a) A=1+ ; 
nln 
Po 
1 for y>A, 
i) K(Y)=4 7(Y) fory=A, if pi >po, 
0 for ¥<A, 
1 for ¥<A, 


ii) kK(Y)=¢ y(Y) for y=A, if pi <po, 


0 for ¥>A; 
b) cg=1.8, y(y)=0.1, P=0.91. 
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Exercise 3.3 


1 for T>4, 
10 


k(Y)=4 0.413 for T=4, with T= 5° yi, 6=0.0214. 


i=1 


0 for T <4 


Exercise 3.4 


1 for 2njdo < CS(2n|a), 
Ao < Ay : k(Y) = 7 

0 for 2nyAo > CS(2n|a); 

1 for 2nydp > CS(2n|1-a@ 
Ag >A: kK(Y) = " ( | ) 

0 for 2njAo < CS(2n|1-a). 
Exercise 3.5 


Use 


1 forM<M,, 
k(Y) =k*(M) = Ya for M = Mg, 
0 forM>Ma, 


instead of (3.24) and a instead of 1-a in the inequality for Mj as well as M, 
instead of M,_,. The proof is analogous to that of Theorem 3.8. 


Exercise 3.6 


1 for 2Agny > CS(2n|a), 
a) k(Y -{ ee ( ‘ 
Qa). 


A 
b) z(A) = Fp, (7 CS(2n )) , where Fp, is the distribution function of CS(2n). 
”) \ Ag : 


| 
0 for 2pny < CS(2n| 


c) Hp is accepted. 


Exercise 3.7 


Solutions and Hints for Exercises 
Exercise 3.8 
a) The existence follows from Theorem 3.8. Putting M(Y) = : i we get 
Ke 1 for M(Y)>2Vn(z1-a+ Vn), 
0 for M(Y)<2V/n(Z1-«+ Vn). 


1 
b) The power function is 7(0)~1-® (G (Z1-a + Vn) - vi) ; 


Exercise 3.9 


a) The existence follows from Theorem 3.11. 
b) e740 —e-%2 =1-a, cye-% —cye- 0% = 0, 

. 1 for y<0.00253 or y>0.3689, 
c) K*(y) = 


1(10,1) = 0.04936 <a=0.05. 


0 else. 


Exercise 3.10 


i; for h, Snp F \/npo(1—-Po)z,_4 
2 


a) k(Y)= 
0 else. 


1 
b) Ho: p= 5 has to be rejected. 


1 
c) Ho: p= 6 has to be accepted. 


Exercise 3.11 


a) Ho: =3.5 has to be rejected. 
b) The probability is 0.68. 

c) 620.065. 

d) Ho: =3.5 has to be rejected. 
Exercise 3.12 

Acception of Ho: <9.5. 
Rejection of Ho : 07 < 6.25. 
Exercise 3.13 


In both cases Ho is accepted. 


Exercise 3.14 


Sample size n = 15. 
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Exercise 3.15 


7 W1= a4’ Wa Va V1l-a 
nOo 1 1 nOo 1 
b) = , b= -1], l3= 00, Ky has th 1- 
Ea a ana) = (xz ) Pres ga eee 
lest mean length. 
9% 
0 for 0<0 or 0= ‘ 
Ja, 
g\" A 
= —]} (1- for 0<0< ; 
c) Wi(9|90) (5) (l-a,) for Tia 


1 g\" f CN ane ey 
—{ ——'| @ or SOs ‘ 
0 2 y/ 1- ay 4/2 


0 for <0 or 82 2 
r0<0 or 0>—, 
oO oO Ya 
(4) n 
W2(8|0) = (5) (l-a) for0<O@<@, 
0 
e\" 4 
1-|— for 09 <0 < — 
@ anes. Va’ 
0 for 0<0, 
Q\" 0 
—] (l-a) for0<@< ; 
W3(|o) = @k ) /1-a 
1 for 0= 
Vv 1l-a 


Only K, is unbiased. 


Exercise 3.16 


a) Kp= SESS ia » Kr= go enna) ; 
2ny 2ny 


b) Kz =[0.0065;+ 00), Kp = [0;0.0189]. 


Exercise 3.17 


Confidence interval 
a 
2 SF (m-1m-11-5) 


a? 2 
SF (m-1,m-1)1-5) 8} 


Solutions and Hints for Exercises 


Exercise 3.18 

a) a(p)=1-(1-p)”, 

b) E(nlp) = 5(1-(1-p)"), 

c) a=0.0956, £=0.3487, E(n|po) =9.56, E(n|p,) =6.51. 


Exercise 3.19 


Sample sizes 
a) n=139, 
b) 1=45. 


Exercise 3.20 
y1-Jo 


2 2 
Ss S 
Bin, 82, 


a) Use the test statistic of the Welch test, namely, ¢* = 
nm Ny 


b) (b1) m, = 206; mz = 103, 
(b2) 1, = 64; ny = 32. 
Exercise 3.21 


Proceed analogously to Section 3.4.2.1. 


Chapter 4 


Exercise 4.1 


The equation C’ b= 0, (b € R”) defines the null space of the (p x 1) matrix C, 
Since the columns of C are supposed to be an orthonormal basis of the p- 
dimensional linear subspace @, it is the rank space (range) of C and the null 
space of C’ is its (m — p)-dimensional orthogonal complement. 


Exercise 4.2 

We know that the (” x p) matrix X has the rank p > 0. The second derivative of 
||Y — Xf? || according to f is equal to 2X7X, and it is therefore positive definite. 
Exercise 4.3 

We put B= X-XGX'X and obtain 

B™B = (X-XGX7X)’ (X-XGX™X) = X™XG™X™XGXTX-XTXGTXTX = O. 


This implies the assertion. 
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Exercise 4.4 


Because of X7 (I, -XGX7) =X? —X?XGX7, we can continue as in Exercise 4.3. 


Exercise 4.5 


Obviously the matrix 


Wee a 


n n 


is symmetric, idempotent (A? = A) and of rank 1. 


Chapter 5 


Exercise 5.1 


Since a normal distribution is supposed, we have only to show that the covar- 
iances vanish. We demonstrate this briefly for cov(y _,y; —7.,); the other cases 
follow analogously. 


a b b a b 
ae y= Dj -A) ae Dj ei) 
v= ap b 


cov(y _,¥;, -¥,,) = cov ab - 


= COV 


, 


a b b a b 
se er, a Ds ee 
ab ab b 


a 


Exercise 5.2 


The data are fed in as follows (see Figure S1). 

Then call ‘Analyze — General Linear Model — Univariate’ and work on the 
menu window as follows (see Figure $2). 

Now pressing ‘ok’ supplies the result in Table S1. (The original output was 
adapted to the text in the book. More information you can find in the SPSS help 
under managing a pivot table.) 


8 “breeze S2acav[Dataser2} «OM SPSS Statice Date Editor 
Ele Eat View Osta Transform Anaiyze Graphs Lmities Extensions = Window —Melp 


Snb6 Resa Sh ABB BAS Woe © 


Visible. 3 of 3 Variables 


Ga Storage Ga feedplant_ F Carctene| var var vat var | ar vat or vac var var vor |v 

T | Glass ye 5.39 
2 | Guss ye 7.68 
3__| Guass ye 9.46 
4 _| Glass ye 812 
5 Glass Lucerne 
6 Glass Luceme 
7 |Giass Luceene 
8 Glass Lucerne 

9 _| Sack ve 

10 | Sack ve 
11_| Seek ye 
12__| Seek ye 
B Sack Lucerne 
“ Sack Lucerne 

15 __| Sack Lucerne 
%6 | Sack Luceme 531 


Se ee ee 


Figure S1 Data of Exercise 5.2. Source: Reproduced with permission of IBM. 


Data Transform Anayae Graphs _Laities Extensions Window Help 


2aHb8a Mesa SLAB HEB BAs 400% 


Visible: 3 of 3 Variables 


de Storage || Gh ieecsiant| P Cartene| ve | var | weve |e |e we le a a 
1 Glass rye 8.39 
2 Glass ye 
3 Glass rye 
4 Glass rye < 
__ 5 | Guass Lucerne 
6 Glass Lucerne @ Dependent Variatte , Cettedet) 
[carotene 
7 Glass Lucene 
| Gen cane Feria {coor 
k & S0109¢ ] 
9 Sac! rye rat 
10 Sack rye feoaptant | 
3 Sack Lucerne » | 
4 | Sack Lucene — 
6 Sack Lucerne ovanate(s) 
16 | Sack Lucerne = 
W | ——s 
18 =| | WLS Weight. 
19 | » 
20 
1 (Lo) (Baste (Bese (cancel) Hep 


(GIA SPSS Statistics Procoscoris ready | Unicode:On | 


Figure S2 Data of Exercise 5.2 and menu. Source: Reproduced with permission of IBM. 


Table $1 Tests of between-subject effects of Exercise 5.2 


Dependent variable: carotene 


Source Type Ill sum of squares df Mean square F 


ray 


Storage 41.635 41.635 101.696 
feedplant .710 1 .710 1.734 
Storage feedplant .907 1 .907 2.216 
Error 4.913 12 409 

Total 888.730 16 


Source: Reproduced with permission of IBM. 


Sig. 


000 
213 
162 
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Exercise 5.3 


Proceed analogously to Exercise 5.2. 


Exercise 5.4 


See the solution of Exercise 4.3. 


Exercise 5.5 


Obviously all three matrices are symmetric. The idempotence can be shown by per- 
forming the products (By - B3(Bz-B3)), (B, --By)(B, -B) and (J, -By)(I,-B1). 
The remaining assertion can be easily seen by calculation. 


Exercise 5.6 


Use the brand new R-packages via https://cran.r-project.org/ and download for 
your computer (see Figure S3). 


FR The Comprehensive Rare... x WE o x 
vie $a Gf 


€2 08 projector 


Download and Tnstall R 
IPrecompiled binary distributions of the base system and contributed packages, Windows and Mac users most likely want one of 


lthese versions of R. 


CRAN 
Mirrors 
What's new * Download R for 
Trak Views SS ee 
Scarch RR is part of many Linux distributions, you should check with your Linux package management system in addition to the link 
pie labove 

Source Code for all Platforms 


The R Journal Windows and Mac users most likely want to download the precompiled binaries listed im the upper box, not the source code. The 
sources have to be compiled before you can use them, If you do not know what this means, you probably do not want to do it! 


Sofware 
Lee ss © The latest release (Monday 2016-10-31, Sincere Pumpkin Patch) R-3 3.2 tar gz, read what's new in the latest version. 
navies 
Packages © Sources of R alpha and beta releases (daily snapshots, created only in time periods before a planned release) 
Other 
© Daily snapshots of current patched and development versions are available here. Please read about new features and bug 
Documentarion fixes before filing corresponding feature requests or bug reports 
Manuals 
FAQs # Source code of older versions of R is available here 
Contnibuted 


© Contributed extension packages 
[Questions About R 


© Ifyou have questions about R like how to download and install the software, or what the license terms are, please read our 
answers to frequently asked questions before you send an email 


Figure S3 Start of the R-program. 


Further, activate at the left-hand side ‘Packages’ and list of available packages by 
name. Then a list with R-packages appears, which contains also OPDOE (see 
Figure S4). 
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R The Comprehensive Rare... x WE =o. 
€ 90a a probestnng e¢ toewnoe 
pun ‘upammsne Upasmzanon an > 
OpasnetUile Opasnet Modelling Environment Utility Functions 
OPDOE P 
opefirmoe Opnon Preing and Estimation of Financial Models sn R 
= openadds Client to Access 'Openaddresses’ Data 
opennie Tools for the Analysis of Air Pollution Data 
opencaze Interface to the OpenCage API 
OpenCl Interface allowing R 10 use OpenCL 
opencou Embedded Scientific Computing and Reproducible Research with R 
OpenlmazeR An Image Processing Toolkit 
openintro Openintro data sets and supplemental functions 
OvenML Exploring Machine Learning Better, Together 
OpeaMPControtter Controt umber of OpenMP threads dynamically 
OvenMy Extended Structural Equation Modelling 
openNLP Apache OpenNLP Tools Interface 
openNLPdata Apache OpenNLP Jars and Basic English Language Models 
OpenRepGrid Tools to analyse repertory grid dats 
openssl Toolkit for Encryption, Signatures and Certificates Based on OpenSSL 
OpeaStreetMap Access 1o Open Street Map Raster [mages 
Docwmentarion opentraj Tools for Creating and Analysing Air Trajectory Data 
— openVA Automated Method for Verbal Autopsy 
: openslss Read, Write and Edit XLSX Files 
Online Prediction by Expert Aggregation 
Additional Binary Operators 
Utilities for Working with R's Operators 
Open Penmetry Interface 
Optimal Power Space Transformation 
Optimal Combinations of Dingnostic Tests Based on AUC © 


Figure $4 List of program packages in R. 


Exercise 5.7 


Maximin 40 
Minimin 14 


Exercise 5.8 


Maximin 9 


Minimin 4 


Exercise 5.9 


Factor A 
Maximin 9 
Minimin 4 
Factor B 
Maximin 51 


Minimin 5 


Exercise 5.10 


Maximin 48 


Minimin 5 
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Exercise 5.11 


Maximin 7 
Minimin 3 
Chapter 6 


Exercise 6.1 


First we put the data in a SPSS data sheet (statistics data editor) and choose 
‘Analyze — General Linear Model — Univariate’. Then we get with our special 
data (see Figure S5). 
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Figure S5 Data of Exercise 6.1 with menu. Source: Reproduced with permission of IBM. 


Now we continue with ‘Paste’ and modify the command sequence as described 
in Chapter 5 in the part concerning the nested classification. We push on the 
button ‘Execute’ and obtain the following results (see Figure S6). 
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Figure S6 SPSS-ANOVA table of Exercise 6.1. Source: Reproduced with permission of IBM. 


Estimate the variance components via 
Analyze 
General Linear Model 
Variance Components 


Exercise 6.2 


First we find dz = 83.67. 
Choosing a2 = 83, the expression 


2 0.5\* 4 0.5f. 0.5 
acan089)= 1% (05 yy 0st «255 [1-599] boone 


200 17 200 200 


turns out to be greater than the corresponding expression 


2 0.5\7 4 05[. 05 
scans) = Ca ) 5 0.57 +2 1355199 boon 


200 16 200] 200 


for dz = 84. Now look for the optimal solution starting with the pairs 


(a = 83,n = 2); (a = 83,n = 3); (a = 84,n = 2); (a = 83,n = 3). 


Exercise 6.3 


The completed data table is given as follows (see Table $2). 
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Table S2 Data of Exercise 6.3. 


Sire 

B, Bz Bs Ba Bs Be By Bg Bo Bio 
120 152 130 149 110 157 119 150 144 159 
155 144 138 107 142 107 158 135 112 105 
131 147 123 143 124 146 140 150 123 103 
130 103 135 133 109 133 108 125 121 105 
140 131 138 139 154. 104. 138 104. 132 144 
140 102 152 102 135 119 154 150 144 129 
142 102 159 103 118 107 156 140 132 119 
146 150 128 110 116 138 145 103 129 100 
130 159 137 103 150 147 150 132 103 115 
152 132 144 138 148 152 124 128 140 146 
115 102 154 122.70 138 124 100 122 106 108 


146 160 139.82 122.70 115 142 135.64 154 152 119 


If you like you can calculate the variance components by hand using the 
described method of analysis of variance. Alternatively you can use statistical 
software as SPSS or R. 


Chapter 7 


Exercise 7.1 


Use the completed data table of Exercise 6.3. In the solution of Exercise 6.3, it is 
Table S2. 

The random division into two classes can be realised with pseudo-random 
numbers that are uniformly distributed in the interval (0,1). A sire is assigned 
to class 1 if the result is less than 0.5, or otherwise to class 2. If in one of the two 
classes are 6 sires, then the remaining sires are put into the other class. We have 
a mixed model of twofold nested classification with the fixed factor ‘Location’ 
and the random factor ‘Sire’. 


Exercise 7.2 


We recommend using SPSS for the solution. Observe the necessary syntax 
modification described in Chapter 5 for nested classification. 
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Exercise 7.3 


It suffices to estimate the variance components of the factor ‘Sire’ using the 
method of analysis of variance by hand. Of course you can do it also with SPSS. 


Chapter 8 


Exercise 8.1 
The partial derivatives of S according to fy and fj are 
0S a 
aie ie i- Po - Pi xi), 
Bo d0 Po- Pi xi) 
0S a 
57 = -2) xi(vi-Po-P1%i). 
op; c.* (yi -Bo -Pixi) 
If these derivatives are put to 0, we get the simultaneous equations 


5 y- nb = 0, 
i=0 


i=1 

n n n 
xiyi-bo xj-b, Xi =0. 

i=1 i=0 i=0 


The first equation supplies (8.10) (if we replace the realisations by random vari- 
ables). If we put bo = y— b,x into the second equation and use random variables 
instead of realisations, the Equation (8.9) is obtained after rearrangement. 


Exercise 8.2 


Because of b=f =(X7X) 'XTY (see Theorem 8.1), we get 

E(b) =E((X7X)'X7Y) =(K7X)"'XTE(Y) = (X7X) "1X7 XB =B 
and as special cases E(bo) =f) and E(b,) =/,. Further, it is 

var(b) = (X7X)'XTvar(Y)X(X7X)*. 


Considering now var(Y) = 071, we find var(b) = 0?(X7X)'. 
In our special case it is 
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and therefore 


XTX = n n 
Sn 
i=l i=l 
as well as 
n n 
ny x; -S oxi 
1 =z = 
Cex) is aS a oe i=1 
i=l“? i=? -S oxi n 
=1 
n n 
ye ty 
1 i=l i=l 


This implies (8.14) and (8.15). 


Exercise 8.3 


Substituting x1 = cos (2x), x2 = In(6x), the case is traced back to a twofold linear 
regression. In b= =(X7X) ‘XT Y, we have now to put 


1 cos(2x1) ~— In(6x) 
1 cos(2x») In(6x2) 


1 cos(2xy-1) In(6%y-1) 


1 cos(2x,) = In(6x,) 


Exercise 8.4 


After feeding the data of Example 8.3 concerning the storage in glass in a SPSS 
data sheet (see Figure S7), we select ‘Analyze — Regression — Linear’ and fill the 
appearing box correspondingly. 
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Figure S7 Data of Exercise 8.4 with menu. Source: Reproduced with permission of IBM. 


Under ‘Statistics’ we request the covariance matrix of the estimations. Then the 
result is presented after pressing the button ‘ok’. There we deleted the correla- 
tion coefficients, since we dealt with model I (see Figure S8). 
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Figure S8 SPSS-output of Exercise 8.4. Source: Reproduced with permission of IBM. 
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Exercise 8.5 


Since we have an odd number (5) of control points, a concrete D — optimal plan 
is given by 


1 303 
a ae 


and the concrete G — optimal plan is given by 


1 152 303 
Ot hye 


For the D — optimal plan it is 


1 1 


1 1 


X=Xp-|1 1 |. 


1 303 
1 303 
and for the G — optimal plan it is 


1 1 
1 1 
1 152 
1 303 
1 303 


Therefore the determinant |X{Xg| = 456 010 of the G — optimal plan is smaller 
than the corresponding determinant |X Xp| = 547 224 of the D — optimal plan, 
which maximises |X7X| for 1 =5 in the interval [1; 303]. 


Chapter 9 


Exercise 9.1 


a) Quasilinear 
b) Quasilinear 

c) Linear 

d) Intrinsically non-linear 
e) Quasilinear 

f) Intrinsically non-linear 
g) Intrinsically non-linear 
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Exercise 9.2 


The non-linearity parameters are 
a) 02, 03 


Exercise 9.3 


The normal equations for the given # = 11 points serving to determine a, b, c 
are non-linear in c. If the first two equations are solved for a and b and if the 
corresponding values are put into the third equation, then a non-linear equation 
g(c) = 0 for c follows, which has to be iteratively solved. 

If one of the usual iterative methods is used initialised, for example, with 
co = —0.5, then after a few iterations a value c~ — 0.406 is obtained. If you want 
to check the quality of iterates c,, you can calculate the values f(c;,), which should 
lie nearby 0. Ifc is replaced in the solution formulas for the two other parameters 
a and b by its approximate value —0.406, then a = 132.96 and b =-56.43 is 
obtained. Hence, the estimated regression function is 


f*(*,6*) = 132.96-56.43 e 0, 
The estimate for the variance can be calculated using the formula 


1 n 
2. pe cXi)2 
S 79-3 240 a-be™) 
The result is s* = 0.761. 

In SPSS we edit the data in a data matrix and program the exponential func- 


tion as shown in Figure S9. 
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Figure S9 Menu for ‘Non-linear Regression’ in SPSS for Exercise 9.3. Source: Reproduced with 
permission of IBM. 
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The result is as follows (see Figure S10). 
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Figure $10 SPS-output ‘Non-linear Regression’. Source: Reproduced with permission of IBM. 


Exercise 9.4 


First, we select a model using the criterion of residual variance. The best fit is 
reached for the arc tan (4) function. 
We now use initial values in SPSS — Non-linear Regression as in Figure S11. 
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Figure S11 The SPSS-program non-linear regression with data of Exercise 9.4. 
Source: Reproduced with permission of IBM. 
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We obtain the results given in Figure $12. 
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Figure $12 Menu for ‘Non-linear Regression’ in SPSS for Exercise 9.4. Source: Reproduced 
with permission of IBM. 


Chapter 10 


Exercise 10.1 


Since the storage type is a fixed factor and the instants of time were prescribed 
by the experimenter, a model I-I of the form 


Hy =H + i+ YZij3 i=1,2;j=1,...,5 
is given with the main effects a, and az for the both storage types and the con- 
tents z,; of carotene. 
Exercise 10.2 


After the command sequence ‘Analyze — General Linear Model — Univariate’, 
the fixed factor and the covariate is entered, as Figure S13 shows. 
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Figure $13 Data and menu in SPSS for Exercise 10.2. Source: Reproduced with permission 
of IBM. 


Pressing ‘OK’ the result follows (see Figure S14). 
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Figure $14 SPSS-output for the analysis of covariance for Exercise 10.2. Source: Reproduced 
with permission of IBM. 
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Chapter 11 


The results are given in this chapter in form of tables (see Table $3, Table S4 and 
Table S5). 


Exercise 11.1 


Table $3 Sample sizes for Exercise 11.1. 


d/o 1 2 3 4 


0.1 1721 1654 1738 1762 
0.2 431 414 435 441 
0.5 69 67 70 71 
1 18 17 18 18 


Exercise 11.2 


Table S4 Sample sizes for Exercise 11.2 (a = 0.05). 


B 
0.05 0.1 0.2 
d 0.5 105 85 64 
1 27 22 17 


Exercise 11.3 


Table S5 Sample sizes for Exercise 11.3 (8 = 0.05, d =o) 


3 4 5 10 20 
a 0.05 28 31 33 40 47 
0.1 23 27 29 36 43 


The remaining sample sizes for other values of / and d are omitted here. 
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Chapter 12 


Exercise 12.1 


Without computer use you can encode the 35 blocks into the numbers from 1 to 
35, write these numbers down on corresponding sheets of paper, lay down these 
sheets into a bowl and draw these sheets without replacement by random. The 
block belonging to the first drawn number gets the first place. The randomisa- 
tion within the blocks can be realised by throwing the dice. For each treatment 
the dice is once thrown: 1 or 4 means position 1; 2 or 5 means position 2; and 
finally 3 or 6 supplies position 3. This can lead to a repeated rearrangement 
within the blocks. 


Exercise 12.2 


The dual balanced incomplete block design (BIBD) has the parameters k = r = 4 
and J = 2. The design is 


(1,2,4,6); (1,2,5,7); (1,3,4,7); (1,3,5,6); (2,3,4,5); (2,3,6,7); (4,5,6,7). 


Exercise 12.3 
We choose m = 2 and obtain 


v=85, b=85, r=21, k=21, A=5. 


Exercise 12.4 
We choose m = 1, which supplies 


v=64, b=336, r=21, k=4, A=1. 


Exercise 12.5 
Analogously to Example 12.3, a BIBD is obtained with the parameters 
v=12, b=22, r=11, k=6, A=5. 


Exercise 12.6 
The parameters of the original BIBD are 
v=8, b=56, r=21, k=3, A=6. 


Solutions and Hints for Exercises | 657 


Exercise 12.7 


In the LS (Latin square) 


QAamNmAmMBRwWDH 
aR QQaQAbwbtm 
s,Qqgmseams 
>aQmaymab 
Qweemrm BO 
amped mw OQ 
moor Qqain 


the columns have to be exchanged such that in the first row the sequence A,B,C, 
D,E,E,G appears. 
Exercise 12.8 


If we cancel in the LS of Exercise 12.7 the last both columns, then we get the 
design 


QanmNmaA mM BRBWDHL 
moa Qaewom 
banmwanms 
>aQmaqynm bs 
Qwewemm BO 


This is no Youden design, since, for example, the pair (A,B) occurs four times, 
while the pair (A,E£) occurs only three times. 


Index Mathematical Statistics 


a 
Acceptance region 80, 142 
Addition table 587, 588, 591 
Admissible 32, 71 

decision function 28, 32 
Affine a-resolvable 578 
Akaike criterion 489 


Alternative hypothesis 80, 83, 88, 


96, 103, 111, 536 
Allocation 28, 454, 488 
Analysis of covariance 495 
Analysis of variance 179, 193, 

207, 341, 403, 573 

one-way 215, 293, 542 
three-way 276, 315 
nested 279 
mixed 286 
two-way 232, 315 
nested 267 
Analysis of variance 
Method 203, 300, 310 
Mixed model 200, 202 
Model I 200, 215, 513, 536 
Model II 200, 293 
table 187, 188, 220, 248 
Ancillary statistic 19 
Antitone likelihood ratio 100 
A-optimal design 338 
Approximate confidence 
interval 297, 310 


Approximately normally distributed 


163, 173, 413, 446, 529 
A priori distribution 9, 31, 41 


Approximate test 128, 149, 160, 404 


Arc tangent function 484 
a-resolvable 578 
a-similar 84, 110 

on the boundary 84, 112 
a-similar test 85, 110 
ASN 149, 161, 166 

empirical 168 

function 169 
Association scheme 596 
Asymptotic 

covariance matrix 446, 451, 

454, 485 

relative efficiency 69, 72 

variance 72, 440, 463, 482 
Asymptotically unbiased 72 
a-test 84, 354, 514 

most powerful 85 
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Disconnected cross 
classification 244 
Discrete design 338, 395, 444 
Distribution 
beta 33, 145, 613 
binomial 10, 15, 18, 22, 50, 58 
exponential 32, 34, 613 
gamma 16 
geometric 33 
hypergeometric 33, 613 
negative binomial 33 
normal 22, 51, 517, 521, 529, 
534, 610 
Pareto 33, 614 
Poisson 32, 107, 610, 614 
uniform 32, 613 
Weibull 33, 614 
D-optimal 
concrete 457 


design 396, 397, 458, 468 
locally 455, 484, 487 
Dual BIBD 578 
Dunnett 
method 555 
procedure 559, 560, 563 
Dunn’s method 547 


e 
Effect 208, 231, 282, 341 
conditional 235 
fixed 53, 179, 207, 215, 231 
random _ 53, 199, 293, 
336, 351 
size 86, 87, 117 
Efficiency 1, 69 
asymptotic relative 72 
function 69 
relative 52, 69 
Efficient estimator 52 
Elementary BIBD 575 
Empirical 2, 136 
ASN function 168 
Bayes method 9 
Bias 463 
power function 169 
Equivalent likelihood function 
12, 57 
Equivariant 54 
estimator with minimal MSD 
56, 71 
Error 
distribution 447 
of (the) first kind 79, 82, 83 
of (the) second kind 79, 82, 83 
random 60, 236 
term 155, 274, 282, 289, 344, 
379, 429, 453 
variance 356 
Error probability 
comparisonwise 541 
experimentwise 541, 559 
theory 61 


Estimable Function 194, 198, 213, 
217, 221, 239, 499, 543 
Estimated asymptotic covariance 
matrix 446, 451 
Estimation 39, 40 
consistent 71 
Jackknife 63, 488 
point 39 
unbiased 41 
of variance components 202, 
293, 300 
Estimator 40 
BAN 74, 611 
best asymptotic normally 
distributed 74 
best quadratic unbiased 315, 611 
efficient 52 
Hodges-Lehmann 68 
Jackknife 63 
linear 41 
linear unbiased 41 
locally variance optimal 
unbiased 45 
with minimal MSD _ 56, 71 
quadratic unbiased 41 
unbiased 42 
uniformly variance optimal 
unbiased 45, 48, 49 
of variance components 315 
variance-invariant 53 
variance optimal unbiased 43 
Euclidian 
geometry 584 


norm 203 
space 8 
Exact 


D-optimal design 396 
test 145, 354, 567 
Expectation vector 124, 194, 437, 
519, 610 
Expected 
length 174 
random loss 29 
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sample size 157 
width 140, 408, 414, 452, 545 
Experiment 
factorial 207, 569, 603 
sequential 2, 147 
Experimental design 28, 567 
optimal 336-338, 394 
statistical 415 
Experimental unit 148, 568 
Exponential distribution 613 
Exponential family 9, 613 
five-parametric 51 
four-parametric 17, 124 
k-parametric 9, 11, 18, 48, 57 
one-parametric 19, 27, 48, 99, 103 
three-parametric 125 
two-parametric 125 
Exponential regression 458, 463 
function 490 


f 
Factor 3, 207, 215 
fixed 230, 355 
levels 207, 232, 273, 569 
nested 260, 279, 344 
noisy (nuisance) 122, 568, 571 
random 293, 348, 355, 362, 390 
superordinated 260, 279, 344 
Factorial experiment 207 
fractional 603 
Factor level 3, 207 
combination 3, 208, 230 
F-distribution 138, 225, 320, 609, 
610, 624 
non-cental 228, 230 
Finite measure 11 
Finite projective geometry 582 
First order interaction 274 
Fisher information 1, 20, 21, 69, 74, 
156, 630 
Fixed sample size 83, 154, 165 
Fleishman system 453, 466 
Frequency distribution 8 
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g 
Galois field 582 


Gamma distribution 16, 628, 634 
Gauss-Markov theorem 181 
Gauss-Newton method 425 
Generalised 

inverse 193, 500 

Tukey method 553 
General linear model 179, 209, 496 
Global minimum 30, 425, 429 
Global R-optimal 29, 68 
Gompertz function 476 
G-optimal design 397, 457 
Group family 55 


h 
Hadamard matrix 592 
Hartley’s algorithm 425 
Hartley’s procedure 426, 431 
Hodges-Lehmann estimator 68 
Hypergeometric distribution 
33, 613 
Hypothesis 80 
composite 80 
linear 179 
one-sided 103, 121, 147 
testable 195, 213, 221, 244, 
270, 500 
two-sided 103, 121 


I 
IBM SPSS statistics 208 
Idempotent matrix 181, 185, 198, 
308, 500 
Identifiable 423 
Inadmissible 32 
Incidence matrix 571, 574, 577, 
586, 597 
Incomplete 
block design 244, 567, 572, 604, 611 
cross classification 232, 243, 
324, 496 


Indifference zone 516, 536 


Inferior factor see nested factor 
Inflection point 438, 469, 473, 476, 
479, 484, 487 
Information 1,5, 9, 20, 69, 74, 156, 285 
matrix 24, 75, 455, 474 
Interaction 207, 280, 235, 239, 250, 
273, 276, 285, 293, 326, 344 
of first order 274 
of second order 274, 278 
Internal regression 431, 474 
Interval estimation 32, 139 
Intrinsically nonlinear regression 424 
Function 424 
Invariant 55, 154 
estimation 53, 195, 202, 305 
test 154, 188 
Isotone likelihood ratio 101 
Iteration 121, 165, 353, 425, 651 


j 
Jackknife criterion 409, 488 
Jackknife estimation 63, 488 


k 
k-dimensional normally 
distributed 52, 413 
k-parametric exponential family 9, 12, 
16, 57, 112 
Kronecker product design 574, 594 
Kurtosis 203, 453, 465 


! 
Latin rectangle 600 
Latin square 600, 611 
Lattice square 600, 601 
Least Squares 
estimator 382, 422, 611 
method 60, 76, 180, 198, 274, 380, 
417, 424, 611 
L-estimator 66 
Level 


of a block factor 571, 600 


of afactor 3, 6, 207, 215, 231, 244, 
260, 273, 293, 315, 341, 
495, 603 
Levene test 139 
Likelihood 
decomposition 14 
function 8, 11, 57, 149, 154, 163 
equivalent 12, 57 
ratio 150, 154 
antitone 99 
isotone 99 
monotone 99 
ratio test 149, 156 
statistic 14 
Linear 
Combination 190, 195, 201, 241, 
296, 344 
Contrast 189, 221, 396 
estimator 182 
hypothesis 179, 185, 192 
model 61, 179, 191, 199, 293 
regression 282, 379 
statistical model 61 
subspace 179, 185 
transformation 181 
unbiased estimator 
Linearity parameter 
Locally 
A-optimal 
Co,-optimal 455, 468 
D-optimal 455, 456, 466, 484 
optimal design 488 
R-optimal 29 
variance-optimal unbiased 


182 
456 


455 


estimator 45, 611 
Location parameter 63 66 
Logistic 


function 39, 383, 429, 438, 473 
regression 429, 431 
Log-likelihood function 153, 155, 
158, 184 
Loss 29 
expected random 39 
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random 29, 39 
Loss function 28, 39, 57, 200 
quadratic 39, 454 


m 
Main effect 230, 273, 293, 362 
Mallows criterion 488 
Mann-Whitney test 133 
Mathematical model 8, 79, 207, 216 
Maximin size 230, 255 
Maximum-likelihood 611 
estimation 203 
restricted 203 
modified 203 
estimator 70, 75, 155, 611 
method 1, 57, 60, 184, 295, 353 
restricted 203 
Maximum minimal size see 
Maximin size 
Mean 40, 517, 542, 544, 546, 552 
difference 548, 557 
loss 39 
square deviation (MSD) 55, 69, 
70, 611 
trimmed 66 
Winsorised 67 
Measure 
finite 11 
Measurable mapping 10, 28, 84, 
139, 445 
Median 66, 67, 68, 72, 77 
M-estimator 67, 68 
Method of moments 
Minimal 
function 583, 586-588 
MSD _ 56, 71 
norm 203 
sample size 31, 175, 416, 559, 563 
sufficient 9, 14-19, 33, 47, 48, 
56, 58, 60, 116 
Minimax 
decision function 32, 36, 631 
estimator 41 


62, 63, 76 
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Minimax (cont'd) 
Minimin size 231 
Minimum 
difference of practical interest 
global 30, 425, 429 
local 30 
probability 531 
relative 337, 425 
sample size 120, 121 
Minimum chi-squared (y’) 
method 61, 62 
modified 62 
Minimum 7’ estimator 62 
MINQUE 203, 204, 295, 305, 308, 
313, 339, 611 
Mixed classification 272, 282, 286, 
288, 289, 292, 334, 349, 369, 
370, 372-374 
Mixed model 199, 295, 341, 348, 
350, 354, 360, 364, 366, 368, 
375, 416 
of analysis of variance 200 
of regression analysis 200, 416 
ML-estimator 57, 58-60, 76—78, 312 
Model 79, 173, 198 
of the analysis of covariance 
(ANCOVA) 204, 495, 496, 
503, 506, 571, 654 
of analysis of variance 
(ANOVA) 179, 188, 193, 
207, 215, 232, 293, 300, 341 
mixed 348 
Model I 
of the analysis of variance 
(ANOVA) 200, 207, 209, 
267, 286, 290 
of (multiple) linear regression 
379, 381, 384, 385, 400 
of regressions analysis 199, 377, 
417, 421 
Model II 199 
of the analysis of variance 
(ANOVA) 40, 200, 293, 294, 
315, 326, 334 


208 


of (multiple) linear regression 410 
of regression analysis 199, 
377, 417 
of selection 515 
Modified 
maximum-likelihood 
estimation 203 
minimum-y” method 62 
Monotone likelihood ratio 96, 
99, 103, 533, 534 
m-point design 395, 454, 455 
discrete 395 
MS-estimation 220, 256, 267, 276, 
281, 286, 290, 343, 359 
Multiple 
comparisons 
556, 559 
decision problem 513, 514, 
536, 541 
linear regression 384, 385, 410 
problem 433 
t-procedure 540, 557, 560, 562 
Multistage sampling 5, 6 
Mutually orthogonal LS (MOLS) 


536, 556, 548, 553, 


602 


n 
Natural parameter 9, 12, 17, 57, 59, 
103-112, 124, 125, 137 
Negative binomial distribution 
33, 613 
Nested classification 233, 260, 279, 
324, 334, 358, 365 
Neyman-Pearson lemma 84, 87-88, 
91, 96, 171 
Neyman structure 
Non-central 
CS-distributed 63, 186, 280 
F-distributed 187, 193, 196, 219, 
400, 501 
t-distributed 111, 117, 120, 126 
Non-centrality parameter 117, 120, 
126, 186, 189, 231, 502 
Non-linearity 
measure 441, 442 


111-113 


parameter 421, 422, 456, 458, 
469, 473, 487 
Non-linear regression function 40, 
378, 379, 384, 421, 437, 447, 458 
Non-parametric test 134, 176 
Normal 
distribution 11, 13, 22, 54, 72, 

154, 163, 302, 517, 534, 536 
n-dimensional 184, 209 
two-dimensional 16, 51, 124 

equation 193, 194, 203, 210, 211, 

216, 238, 242—246 

Null space 189, 639 
Number of replications 
572-575, 601 


255, 568, 


Oo 
One factorial (experimental) 
design 569, 570 
One-parametric exponential 
family 19, 27, 34, 48, 99, 
103, 107, 113, 149 
One sample 
problem 126, 161 
t-test 137 
One-sided 
confidence interval 140, 147 
hypothesis (test) 94, 96, 99, 113, 
126, 404 
Open sequential test 
Optimal 
choice of sample sizes 126, 338, 415 
choice of support points 454-458 
experimental design 336-338, 
394-397, 454 
Optimal design of experiments 
(OPDOE) 119, 121, 126, 153, 
161, 208, 231, 250, 278, 520, 
559, 604: 
Optimality criterion (condition) 
143, 336, 395, 455 
Order statistic 64-68, 551 
Orthogonal 
(linear) contrast 


149 


57, 


189, 227, 541 
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polynomials 385-388 
projection (projector) 
190, 197, 400, 434 
Orthonormal basis 183, 190 
Outlier 66 


180, 182, 187, 


p 


Parameter 
space 16, 29, 53, 96, 113 
vector 9, 51, 144, 189 
Parametrisation 159, 437, 487 
Pareto distribution 33, 614 
Partial 
correlation coefficient 411, 412 
regression coefficient 412 
Partially balanced incomplete block 
design see PBIBD 
Partially non-linear regression 
function 421 
PBIBD 244, 573, 596 
Permutation 514, 602 
p-factorial (experimental) design 569 
Pitmann efficiency 71 
Point estimation 2,32, 39-77, 139, 209 
Poisson distribution 107, 610 
Polynomial regression 
function 385, 388 
Population 2, 8, 341, 517, 530 
statistical 2-4 
Power 83, 118, 355, 451 
function 84, 85-87, 102, 104, 118, 
123, 150, 169 
p-point design 456-458 
Practically interesting Mimimal 
(minimum) difference 86, 117 
Precision requirement 117, 118, 126, 
128, 147, 165, 230, 255 
Prediction 
best linear unbiased (BLUP) 200 
Primary unit 5, 6 
Primitive element 
Probability 
distribution 8, 11, 14, 31 
function 8, 11 
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Probability (cont’d) 

measure 41, 139, 395 
Program 

OPDOE see OPDOE 

package R 119, 208 
Projection 

Orthogonal see orthogonal 

projection 


Pseudo random number 545, 646 


q 
Quadratic 
estimation (estimator) 
202, 305 
form 186, 187, 219, 267, 294, 
305, 351 
loss function 39, 55, 454 
unbiased estimator 202, 203, 315 
Quantile 81, 90, 98, 110, 111, 120 
Quasilinear 
polynomial regression 388, 405 
regression 388, 421, 443 
function 384, 385 


41, 53, 


r 
Radon-Nicodym density 11 
Random 4, 360, 361, 377, 410 
loss 29, 39 
sample 1, 6, 8, 9 
variable 1,8 
Randomisation 568, 569, 572, 601, 
602, 656 
complete (or unrestricted) 569 
Randomised test 83, 84, 96, 145 
Random sampling (procedure) 4—7 
stratified 4, 6, 7 
unrestricted 6 
Random sampling 
with replacement 6 
without replacement 6, 7 
Range 64, 66 
augmented studentised 553 
studentised 550, 554 


Rank 
space 191, 197, 210, 379, 
543, 610 
statistic 64 
vector 66 
Rao-Blackwell theorem 47 
Rao-Cramér inequality 26, 27, 35, 
69, 70 
Rayleigh distribution 172 
RCDs 567, 600-603, 611 
Realisation 8, 9, 14, 28 
Rectangle 
Latin 600, 601 
Latinised 600, 601 
Rectangular distribution 170 
Region critical 80 
Regressand 202, 378, 417, 421 


Regression 
analysis 31, 179, 191, 199, 377, 
417, 421 
coefficient 384, 391, 411, 506 


partial 412 

coefficient within classes 458, 504 

internal 431, 474, 482, 485 

intrinsically non-linear 393, 421, 
431, 443, 447, 456 

line 381, 383, 393, 402, 405, 
409, 417 

logistic 427, 429-431 

model 200, 396, 405, 456, 473, 
479, 495 

multiple linear 

problem 433 

quasi-linear (see Quasilinear 
regression) 

quasi-linear polynomial (see 
Quasilinear polynomial 
regression) 

simple linear 381, 396, 400, 401 

Regression function 378, 384, 458 

exponential 463 

intrinsically non-linear 
447, 456 


384, 385, 410, 433 


421, 431, 


non-linear 384, 421, 431, 435, 
447, 456 
polynomial 385 
quasilinear 384, 385 
Regressor 200, 377, 410, 421, 495 
Rejection region 80, 115, 142 
Relative 
efficiency 52, 69, 72 
minimum see Minimum, relative 
Reparametrisation 159, 220, 440, 473 
optimal 443 
Replication 255, 568, 601 
Residual variance 312, 314, 407, 433, 
462, 476, 488 
R-estimator 68 
Restricted maximum likelihood 
estimator 203 
Restricted maximum likelihood method 
(REML) 203, 295, 304, 315, 
353, 362 
Result 
asymptotic 69 
Richards function 487 
Risk 29-31, 36, 79, 117 
of the first kind 79, 117 
function 29, 30, 32, 39 
of the second kind 79, 117 
Risk function 
of the first kind 84 
of the second kind 84 
Robust 136, 452 
Robustness 32, 124, 135 
Row-column designs (RCDs) 567, 
600-603, 611 


Ss 

Sample 3 
censored 4 
concrete 3 
median 66, 67, 72 
random 4,8 
representative 3 
small 69 
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space 8, 10, 14, 19 
Sample size 5 
determination 208 
expected 157 
fixed 83, 154, 165 
minimal 31, 416, 559, 563 
Sample variance 111, 118, 535, 627 
Sampling 4 
arbitrary 5 
cluster 4, 7 
multistage 5,6 
procedure 4, 6 
random 4,5 
with replacement 6 
sequential 5 
simple random 4 
stratified 4, 6, 7 
systematic with random start 6 
without replacement 6 
Scale parameter 533 
Scatter plot 462, 463, 471, 480, 
483, 486 
Scheffé method 542, 552, 554, 562 
Schwarz criterion 489 
Schwarz’s inequality 26, 27, 
46, 260 
Secondary unit 5,6 
Second order interaction 274, 
275, 278 
Selection 417, 517, 530 
of the normal distribution with 
the largest expectation 534 
of the normal distribution with 
the smallest variance 535 
problem 79, 515 
procedure 514 
rule 32, 515, 517, 530, 563 
Sequential 
likelihood-ratio test 
(SLRT) 149, 156 
147, 159, 414 
160 


test 
triangular test 
t-test 153 
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Sequential path 160 
Sequential test 
closed 149 
open 149 
Side condition 210, 237, 254, 280, 
294, 319, 341, 358 
Simple 
analysis of variance 488, 513, 536 
classification 
of analysis of covariance 
of analysis of variance 
(one-way) 297 
experimental design 569 
linear regression see Regression, 
simple linear 
PBIBD 599 
Simulation experiment 136, 138, 
146, 329, 451, 452, 463 
Simultaneous 
confidence interval 543 
confidence region 408 
(normal) equations 57, 62, 107, 
155, 214, 319, 425 


503 


Size 
of experiment 
maximin 230, 255, 282, 355 
mean 150 
minimin 230 
minimal see mimimum sample size 
Skewness 24, 134, 137, 408, 440, 
453, 465 
SLRT 149, 612 
Small sample 3, 5, 69, 160, 478 
Solution surface see Expectation 
surface 
Space 
Euclidian 8 
SPSS 108, 129, 146, 208, 228, 244, 271, 
295, 314, 356, 390 
SS see Sum of squares 
between 218, 227, 266, 284 
total 218, 275, 402 
within 218, 266, 285 


Standard 
normal distribution 81, 110, 135, 
414, 518, 610 
State space 28 
Statistic 10 
ancillary 19 
sufficient 10, 20, 23, 47 
Statistical 
decision problem 28 
decision theory 28 
experimental design 28 
model 179, 568 
test 5, 79 
Stratified sampling 4, 6, 7 
Stratum 6 
Strength of a sequential test 
Studentised 
augmented range 
range 550, 554 
Student’s test 116 
Subclass number 
equal 315, 326 
unequal 315 
Subset formulation 516 
Sufficiency 9, 11 
Sufficient statistic 9,11, 14, 23 
Sum of squares 188, 220, 260, 313, 
320, 358, 547, 612 
of deviation see MSD 
Support 68, 395, 434, 444, 453, 
466, 488 
of an experiment 28 
Systematic sampling 6 


148 
550, 553 


see also t-test 
244, 281, 302 


t 
Tangent hyperbolic (tanh) 
function 479 

four parametric 479 

three parametric 438, 473 
Tangent plane 434 
Taylor expansion 155 
t-distribution 406 

central 111, 117, 120, 126, 451, 461 


Test 
a-similar 84, 110 
approximate 128, 149, 158, 404 
invariant 154, 188 


with Neyman structure 
see Neyman structure 
non-parametric 134 
of parallelism 405, 416 
randomised 83, 96, 145 
statistical 5, 79 
uniformly best 85 
Testable hypothesis 
245, 250 
Three-parametric tanh (hyperbolic 
tangent) function 
see Tangent hyperbolic 
function 
Three-point design 466 
Three-way analysis of variance 272 
cross classification 272, 282, 
288, 290 
mixed classification 282 
nested classification 279 
Total mean 207, 215, 218, 234, 


196, 214, 


247, 261 
t-procedure 
multiple 540, 557, 560 


Trapezium (trapezoidal) method 433 
Treatment 2, 147, 160, 215, 417, 
458, 555 
Treatment factor 569, 571, 603 
Triangular design 599 
Triangular test 159, 160 
sequential 160 
Trimmed mean 66 
Trivial BIBD 576, 578 
Truncation 417 
t-Test 115, 127, 136, 154, 404 
Tukey method 550 
Generalised 553 
Tukey procedure 558, 560 
Two-dimensional normal 
distribution 16, 51 
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Two-parametric exponential 
family 125, 153 
Two-point distribution 10 
Two-sample 
problem 124, 133, 158 
t-Test 126 
Two-sided alternative 
Two-way 
analysis of variance 232 
cross classification 233 
incomplete cross 
classification 232 
nested classification 260 


103, 105, 121 


U 
UMPU, a-test 104, 105 
Unbiased 
estimator 41, 52, 69 
with minimal variance 47, 182, 
305, 315 
a-Test 106, 123 
Unequal subclass number 
see Subclass number, unequal 
Uniform 
convergence 444 
distribution 613 
Uniformly best 
optimal 40 
test 85 
unbiased test 106, 123 
Uniformly distributed 95 
Universe 4, 208, 216, 293, 417 
Unrestricted random sample see 
Random sampling, 
unrestricted 


V 
Variance 
asymptotic 72 
component (estimation) 
300, 310 
invariant 53 
optimal 42 


202, 239, 
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Variance (cont'd) 
optimal unbiased 43, 44, 69 
optimal unbiased estimator 44 
Vector space 181, 211, 397 


Ww 

Weibull distribution 614 

Weight (function, discrete) 
41, 395 


Welch test 127, 145, 296 

Wilcoxon test 133 

Winsorised mean 67, see also Mean, 
winsorised 

Within class correlation 
coefficient 300 


y 
Youden design 600, 612 


